The question is indicated in the title. The reason is not clear. I give an example script:

SET mapred.job.queue.name 'data_mining'; SET mapreduce.map.failures.maxpercent 5; raw = LOAD 'hbase://xxx' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:urlN') AS (urlN:chararray); STORE raw INTO 'links' USING PigStorage ('\t'); 

Also give a log:

HadoopVersion PigVersion UserId StartedAt FinishedAt
Features 2.6.0-cdh5.4.1 0.12.0-cdh5.4.1 urvanov 2018-08-03 22:45:56 2018-08-04 05:15:58 UNKNOWN

Success!

Job Stats (time in seconds): JobId Maps Reduces MaxMapTime
MinMapTIme AvgMapTime MedianMapTime MaxReduceTime
MinReduceTime AvgReduceTime MedianReducetime Alias
Feature Outputs job_1517304584149_185731 2690 0 20492 3840 8674 8661 n / an / an / an / a raw
MAP_ONLY hdfs: // ix-jupiter / user / urvanov / collection_1,

Input (s): Successfully read 246642204846 records (1137790 bytes) from: "hbase: // collection_1"

Output (s): Successfully stored 246642204846 records (29915201763044 bytes) in: "hdfs: // ix-jupiter / user / urvanov / collection_1"

Counters: Total records written: 246642204846 Total bytes written: 29915201763044 Spillable Memory Manager spill count: 0 Total bags proactively spilled: 0 Total records proactively spilled: 0

Job DAG: job_1517304584149_185731

2018-08-04 05: 15: 59,998 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus = SUCCEEDED. Redirecting to job history server 2018-08-04 05: 16: 00,921 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!

    1 answer 1

    There is no problem as such. It is connected with the fact that after having executed the pig-script and writing data to hdfs, the size for some reason is not indicated for the subdirectories (I don’t always know that). In this case, links and links_time weigh 1tb and 9tb. But their size is not specified. It misled me

    enter image description here