Skip to main content

Posts

Featured

What's the Sqoop on Mainframe?

Someone asked "How do we pull data from a mainframe for analysis?". At the time, I got a lot of puzzled looks. Pulling data from mainframe can sound daunting. There are a couple of ways to approach it:- DB2 (IBM Database 2) connection Can be more taxing computationally on the mainframe as it needs to process the DB2 SQL to retrieve the desired results repeatedly. File Transfer Protocol (FTP) Simple file transfer Less computationally expensive if the data set can be extracted predictably. Apache Sqoop is capable of pulling the data out of mainframe using both mechanisms. The FTP method is what I've used before when pulling mainframe datasets into Hadoop. The mainframe FTP server doesn't behave identically to most standard FTP servers, here are some of the differences:- the folder hierarchy is separated by periods/dots (.). the syntax to reference folders/files usually uses quotes. eg. 'folder1.folder2' the logical type of the last item in th...

Latest posts

Hello World