PowerBuilder – Data Load/ETL process

Posted on Wednesday, August 18th, 2010 at 8:20 pm in

One common piece of functionality in many business applications is the ability to import data from an external source. This could be anything from invoices and orders to inventory forecasts and patient medical histories. Quite often if this type of process is not considered in a generic context you end up with a solution which is too specific and is only suited for the immediate need. As time goes on you can end up ‘reinventing the wheel’ many times over leading to a maintenance nightmare.

If you are lucky enough to be dealing with clients using some sort of standard (EDI, HL7, etc.) or if you are able to establish the ‘standard’ file layout you have passed a major hurdle. At this point you can develop the process of getting the ‘messages’ (raw data records) into the database. In general, my vote for this phase is a process consisting of stored procedures which are triggered by some type of scheduled job. Remember, at this phase you are only importing the ‘raw’ data into a preliminary set of ‘raw data tables’; no business rule validation is taking place.

From an auditing standpoint I tend to keep copies of the data at each stage of the process. By this I mean I copy the original file to an archive location, import the file into a raw data table, process the raw data into another table where datatype checking and message ‘completeness’ (proper length, correct number of segments, etc.) is performed, and then bring the records into the data tables of the application itself (and apply business logic). This way if there is any question on what happened to any given transaction, I can trace its path step by step.

General Process Overview

[file] & [file identifier table] >
	File contains the data from source as recieved/retrieved
	Files stored in unique folder (date based) with unique name
	File identifier table has simple structure:
		uniquekey
		sourceid
		datetimerecieved
		filename
		location
		processedflag

	Process needs to:
		monitor for files
		create destination location
		copy file to destination location
		erase file from source
		insert record into file identifier table
		possibly send acknowledgement back to file originator
		possibly launch next step

[raw data table] >
	Has single entry for each record in file
	Simple structure:
		uniquekey
		sourceid
		fileidentifierkey (uniquekey from file identifier table)
		datetimeentered (when the record got into this table)
		data (single record from file)
		possible sequence number if multiples are sent from source in a single file
		processedflag

	If the incoming data is many single record transmissions, you may want to eliminate
the file and file identifier table stage and simply put the messages here.

	Process needs to:
		read file identifier table for unprocessed records
		open file
		parse file and insert individual records into raw data table
		mark record in file identifier table as processed
		possibly send error back to file originator
		possibly launch next step

[transfer table(s)] >
	Temporary holding area for data being loaded into application.
	Records marked processed removed prior to next load
	May be several tables if raw data contains repeating segments (EDI/Gentran type transmissions)
	Column datatypes used
	Table structure can be complex:
		uniquekey
		rawdatakey (uniquekey from raw data table)
		datetimeentered
		sourceid
		(whatever columns are needed to hold data)
		possible sequence number if multiples are sent from source in a single file
		processedflag

	Child or ancillary tables will need to have a foreign key relation to the master table

	Process needs to:
		Remove rows marked as processed for same sourceid
		Get records from raw data file for the sourceid which have not been processed
		Validate datatypes
		Write entry to error/log table if a record is incorrect
		Validate 'completeness' of data (all required segments, length of data, etc.)
		Write entry to error/log table if a record is incorrect
		Mark raw data table record as processed
		Possibly send errors back to file originator
		Possibly launch next step

[application table(s)]
	Data brought into the application
	Process needs to mark the transfer table records as processed

One way in Powerbuilder to bring the data into the application is through the use of the LibraryExport method. With this technique you can set up a .pbl file which contains the datawindow object for a specific dataset you are expecting from the records in the transfer table. A nice feature of this is if there is some minor change to the data being sent (additional column, change in column, etc.), you can easily and quickly make the change, deploy the .pbl file, and you are back in business without having to rebuild the entire application. This assumes the change doesn’t require some processing logic change within the application.

//get dw from pbl
string        ls_PBLName, ls_data_object, ls_DWSyntax
//read stuff from ini file
ls_PBLName = profilestring ( is_ini_file, is_section, 'PblName', '' )
ls_data_object = profilestring( is_ini_file, is_section, 'DataObject', '' )

IF ls_PBLName > '' AND ls_data_object > '' THEN
	// get the dwo from the .pbl
    ls_DWSyntax = LibraryExport ( ls_PBLName, ls_data_object , ExportDataWindow! )
	// create the datawindow
    dw_1.Create ( ls_DWSyntax )
ELSEif ls_data_object > '' then
    this.dataobject    = ls_data_object
ELSE
// whatever else you want to do.
END IF
dw_1.retrieve()
// and there you go...

Top