Sqoop split by string column

Rupesh Kumar Singh
1 min readSep 20, 2020

--

Sqoop split by string column

This method needs to determine the splits between two user-provided strings. In the case where the user’s strings are ‘A’ and ‘Z’, this is not hard; we could create two splits from [‘A’, ‘M’) and [‘M’, ‘Z’], 26 splits for strings beginning with each letter.

Ex1. in table first row contain value A and Z and try to with mapper 3 then 3 split, A-I in part-00000, J-Q in part-00001, R-Z in part-00002, if we try split 4 then A-G in part-00000, H-M in part-00001, N-S in part-00002, T-Z in part-00004

Ex2. in table first row contain value A and D and try to with mapper 2 then 2 split, A-B in part-00000, C-D in part-00001

Ex3. in table first row contain value A and C and try to with mapper 3 then 3 split, A in part-00000, B in part-00001, C in part-00002, if no value for B then part-00001 is empty.

Refrence: http://hadoop.apache.org/docs/r2.10.0/api/org/apache/hadoop/mapreduce/lib/db/TextSplitter.html

--

--

Rupesh Kumar Singh
Rupesh Kumar Singh

Written by Rupesh Kumar Singh

An IT professional with 10+ years of experience, Python | pandas| Django | Flask | Superset | pyspark | FullStack | Hadoop | AWS | php | no-SQL | ETL | Data-pip

No responses yet