PowerBuilder – Data Load/ETL process
One common piece of functionality in many business applications is the ability to import data from an external source. This could be anything from invoices and orders to inventory forecasts and patient medical histories. Quite often if this type of process is not considered in a generic context you end up with a solution which is too specific and is only suited for the immediate need. As time goes on you can end up ‘reinventing the wheel’ many times over leading to a maintenance nightmare.
If you are lucky enough to be dealing with clients using some sort of standard (EDI, HL7, etc.) or if you are able to establish the ‘standard’ file layout you have passed a major hurdle. At this point you can develop the process of getting the ‘messages’ (raw data records) into the database. In general, my vote for this phase is a process consisting of stored procedures which are triggered by some type of scheduled job. Remember, at this phase you are only importing the ‘raw’ data into a preliminary set of ‘raw data tables’; no business rule validation is taking place.
From an auditing standpoint I tend to keep copies of the data at each stage of the process. By this I mean I copy the original file to an archive location, import the file into a raw data table, process the raw data into another table where datatype checking and message ‘completeness’ (proper length, correct number of segments, etc.) is performed, and then bring the records into the data tables of the application itself (and apply business logic). This way if there is any question on what happened to any given transaction, I can trace its path step by step.
General Process Overview
[file] & [file identifier table] > File contains the data from source as recieved/retrieved Files stored in unique folder (date based) with unique name File identifier table has simple structure: uniquekey sourceid datetimerecieved filename location processedflag Process needs to: monitor for files create destination location copy file to destination location erase file from source insert record into file identifier table possibly send acknowledgement back to file originator possibly launch next step [raw data table] > Has single entry for each record in file Simple structure: uniquekey sourceid fileidentifierkey (uniquekey from file identifier table) datetimeentered (when the record got into this table) data (single record from file) possible sequence number if multiples are sent from source in a single file processedflag If the incoming data is many single record transmissions, you may want to eliminate the file and file identifier table stage and simply put the messages here. Process needs to: read file identifier table for unprocessed records open file parse file and insert individual records into raw data table mark record in file identifier table as processed possibly send error back to file originator possibly launch next step [transfer table(s)] > Temporary holding area for data being loaded into application. Records marked processed removed prior to next load May be several tables if raw data contains repeating segments (EDI/Gentran type transmissions) Column datatypes used Table structure can be complex: uniquekey rawdatakey (uniquekey from raw data table) datetimeentered sourceid (whatever columns are needed to hold data) possible sequence number if multiples are sent from source in a single file processedflag Child or ancillary tables will need to have a foreign key relation to the master table Process needs to: Remove rows marked as processed for same sourceid Get records from raw data file for the sourceid which have not been processed Validate datatypes Write entry to error/log table if a record is incorrect Validate 'completeness' of data (all required segments, length of data, etc.) Write entry to error/log table if a record is incorrect Mark raw data table record as processed Possibly send errors back to file originator Possibly launch next step [application table(s)] Data brought into the application Process needs to mark the transfer table records as processed
One way in Powerbuilder to bring the data into the application is through the use of the LibraryExport method. With this technique you can set up a .pbl file which contains the datawindow object for a specific dataset you are expecting from the records in the transfer table. A nice feature of this is if there is some minor change to the data being sent (additional column, change in column, etc.), you can easily and quickly make the change, deploy the .pbl file, and you are back in business without having to rebuild the entire application. This assumes the change doesn’t require some processing logic change within the application.
//get dw from pbl string ls_PBLName, ls_data_object, ls_DWSyntax //read stuff from ini file ls_PBLName = profilestring ( is_ini_file, is_section, 'PblName', '' ) ls_data_object = profilestring( is_ini_file, is_section, 'DataObject', '' ) IF ls_PBLName > '' AND ls_data_object > '' THEN // get the dwo from the .pbl ls_DWSyntax = LibraryExport ( ls_PBLName, ls_data_object , ExportDataWindow! ) // create the datawindow dw_1.Create ( ls_DWSyntax ) ELSEif ls_data_object > '' then this.dataobject = ls_data_object ELSE // whatever else you want to do. END IF dw_1.retrieve() // and there you go...