How to read a csv file in HDFS?
I am passing file.csv via DistributedCache and need to create 4 arrays in
setup() as below:
|A|,|101|,|PE1|,|MA1|
|B|,|102|,|PE2|,|MA2|
|C|,|103|,|PE3|,|MA3|
|D|,|104|,|PE4|,|MA4|
Path[] cacheFile = new Path[0];
String[][] tableData;
@Override
public void setup(Context context) {
try {
cacheFile =
DistributedCache.getLocalCacheFiles(context.getConfiguration());
} catch (IOException e) {
e.printStackTrace();
}
// read file.csv (cacheFile) and load the data into tableData[][]
}
I think I can load the csv file myself using split(",") and then
replaceAll("|", ""). But I don't know how to first open and read the
content of file.csv, as now it is in HDFS. Any suggestion would be of
great help. Thanks
No comments:
Post a Comment