What's the Sqoop on Mainframe?
Someone asked "How do we pull data from a mainframe for analysis?". At the time, I got a lot of puzzled looks.
Pulling data from mainframe can sound daunting. There are a couple of ways to approach it:-
The mainframe FTP server doesn't behave identically to most standard FTP servers, here are some of the differences:-
Apache Sqoop has this logic built in (SQOOP-2938) to map the folders transparently, allowing the user to simple specify the --datasettype and --dataset parameters.
The behaviours of each --datasettype setting are as follows:-
From the Sqoop documentation pages, an example of importing data from mainframe:-
sqoop import-mainframe --dataset SomeGdg --connect <host> --username myuser --password-alias \
mypasswordalias --datasettype g --tape true --outdir /tmp/imported/sqoop \
--target-dir /data/imported/mainframe/SomeGdg
This command will do the following:-
Pulling data from mainframe can sound daunting. There are a couple of ways to approach it:-
- DB2 (IBM Database 2) connection
- Can be more taxing computationally on the mainframe as it needs to process the DB2 SQL to retrieve the desired results repeatedly.
- File Transfer Protocol (FTP)
- Simple file transfer
- Less computationally expensive if the data set can be extracted predictably.
The mainframe FTP server doesn't behave identically to most standard FTP servers, here are some of the differences:-
- the folder hierarchy is separated by periods/dots (.).
- the syntax to reference folders/files usually uses quotes. eg. 'folder1.folder2'
- the logical type of the last item in the hierarchy changes depending on data set type:-
- Sequential Data Set - the last item type is a file.
- Partitioned Data Set - the last item is a folder.
- Generation Data Set - the last item is a folder.
![]() |
| Dataset types, example named datasets and their corresponding filesystem mapping on the FTP server |
The behaviours of each --datasettype setting are as follows:-
- 'p' - partitioned data set. This retrieves ALL the files in a folder, the resulting output is multiple files.
- 'g' - generation data group. This retrieves the 'latest' file in the data group, determined by lexical order (last GDG file in the FTP folder listing), resulting output is a single file.
- 's' - sequential dataset. This retrieves a single file from the FTP server.
From the Sqoop documentation pages, an example of importing data from mainframe:-
sqoop import-mainframe --dataset SomeGdg --connect <host> --username myuser --password-alias \
mypasswordalias --datasettype g --tape true --outdir /tmp/imported/sqoop \
--target-dir /data/imported/mainframe/SomeGdg
This command will do the following:-
- Initiate the FTP connection to <host> with login myuser
- Change working directory into the SomeGdg folder to retrieve the latest generation data file
- Place the output in /tmp/imported/sqoop.

Casino Site » Lucky Club Casino Review (2021)
ReplyDeleteLucky Club Casino Review · All the bonus rounds are in-depth and there are many promotions to get the most out of the best welcome luckyclub.live offers. · Mobile app
This pace allows them to quickly tweak the Shelby GT500 little by little, bringing in new bodily designs at breakneck speeds. They additionally developed a brand new} spoiler-wing hybrid design that the team is high precision machining asking ‘the swing’. This new design is a outstanding factor within the GT500’s impressive aerodynamic capabilities. If you’ve ever owned a rare or older automotive, you understand that repairs can be costly and a headache. Instead of having them manufactured within the conventional method, they’re looking to 3D printers, that are much less time-consuming and less expensive.
ReplyDelete