A Scala productivity framework for Hadoop.
Index
I am fairly new to Scala, Hadoop & Scoobi.
We have some hadoop jobs where we process CSV
files and do the Scoobi routines with
// Parse the input file
val lines = fromTextFile(input)
// Iterate on every element to generate the keys, and then aggregate it
val counts = lines.mapFlatten( ...
1. I have the impression that I can't do it for XML files. Is that so? or can i process XML with Scoobi?
2. I think I can parse and flatten the XML nodes to a lines with scala native xml. But then how do I create a Scoobi DList.
(why? because I will need to join it with another one coming from a CSV file)
Note : My xml consists of nodes like the following :
<add>
<AdCampaign class="BCSAdCampaign">
<Subscriber>TVC</Subscriber>
<CampaignName>3402376</CampaignName>
<CampaignId>1NTGXNAY</CampaignId>
<AccountManager/>
<FromDate>20130212</FromDate>
<ToDate>20140207</ToDate>
<ReportingInd>N</ReportingInd>
<CampaignAdmin>NAWASTHI MCG-TVC</CampaignAdmin>
<SalesChannel>TC8</SalesChannel>
<Email/>
<Advertiser>MU0</Advertiser>
<Date>20150609</Date>
</AdCampaign>
</add>
Source: (StackOverflow)