The present invention is a multi-layer computer architecture which separately extracts ALC business and logical functions and data. The architecture creates a Java object model or other target language object model which allows comparison of ALC data with target language data to verify logical processes. These object models can be directly traced back to the legacy ALC. The data model is automatically generated from a scan of the ALC and leverages generic patterns which can be reused to generate Java representations of other legacy code bases.
PatentSwarm provides a collaborative workspace to search, highlight, annotate, and monitor patent data.
Tip: Select text to highlight, annotate, search, or share the selection.
The invention described herein was made by an employee of the United States Government and may be manufactured and used by the Government of the United States of America for governmental purposes without the payment of any royalties thereon or therefore.
This patent application claims the benefit of U.S. Provisional Applications Nos. 62/445,603 filed Jan. 12, 2017 and 62/445,188 filed Jan. 11, 2017. The above applications are incorporated by reference herein in their entirety.
This invention relates to the field of programming languages and more specifically to a method for translating assembler code languages.
The Internal Revenue Service currently has two U.S. co-pending patent Applications entitled “Translation of Assembler Code Using Intermediary Technical Rules Language (TRL)” and “Method for Translation of Assembler to Valid Object Oriented Programming Language.”
These co-pending patent applications teach various methods and structures for translating legacy assembler computer language to an object-oriented target language using an intermediary Technical Rules Language (TRL).
IRS currently has a large inventory of legacy applications written in IBM. Recently, the IRS Enterprise Services (ES) team has developed ALC-to-Java translation tools to support the migration of the IRS Individual Master File (IMF) ALC applications to Java.
There are many challenges in translation from a machine/low-level programming language, such as ALC, to a high-level programming language like Java. Fundamentally, the coding conventions used in ALC are very difficult to grasp by developers who are trained in modern coding practices.
FIG. 1 illustrates a complex ALC control flow example. As illustrated in FIG. 1, ALC is characterized by complexities such as lack of basic conditional logic (if-then-else/loops). This emanates from non-standard subroutine linkage reflecting that some of the code was written in 1968 when no coding standards existed. ALC exhibits widespread. These and other complexities result in numerous challenges in the translation from the source language of ALC to the target language of Java.
There is an unmet need for a multi-layer computer architecture which can create ALC to Java Object Models (JOMs) capable of being directly traced back to the legacy ALC.
There is a further unmet need for a computer architecture with configured rule sets that can be reused to generate Java representations of other legacy code bases.
The present invention is computer architecture for translating ALC to Java. The layered data model to ultimately represent IMF data structures in a Java Object Model (JOM). These data structures can be traced back to the legacy ALC. An ALC program is executed in a mainframe environment while the TRL program is executed in Java Runtime Environment (JRE).
At a selected break point, instrumented ALC code can dump the mainframe program memory to flat file(s). The mainframe memory dump file is then transferred to the Java environment. The TRL engine can load the memory dump from the file, convert the physical memory dump to TRL data memory buffers and variable values, and then execute the TRL program.
At another selected break point, ALC code can create a second memory dump, to be compared to the TRL program execution result at the same break point.
One Model includes a physical Layer which simulates the ALC physical memory structures. A two-layer data model performs Analyzer functions on ALC program files to detect code. ALC to TRL Translator Tool generates TRL files, detects translation errors and updates itself until no errors are detected.
Data structures are extracted from ALC code and converted to Java Object Models (JOM). JOMs can be directly traced back to the legacy ALC.
FIG. 1 is a comparison of ALC and object oriented control flow structures.
FIG. 2 is a flow chart illustrating an exemplary execution ALC to Java translation process.
FIG. 3 illustrates one embodiment of multi-layered computer architecture for ALC to Java translation.
FIG. 4 illustrates an exemplary test of an ALC to TRL conversion using a data dump process.
FIG. 5 is a base/displacement addressing scheme in ALC and TRL.
FIG. 6 illustrates exemplary self-modifying code in ALC.
FIG. 7 illustrates exemplary data structures used for translation of self-modifying code.
FIG. 8 illustrates an exemplary Data Extraction Tool used to generate JOM data structures.
FIG. 9 is an illustration of serialized extraction output.
FIG. 10 is an exemplary mapping of extracted Java Objects to an ALC data Declaration.
FIG. 11 is an exemplary extraction log.
FIG. 12 illustrates an exemplary flow chart of an ALC to TRL processing tool.
FIG. 13 is a flow chart of an exemplary function for parsing ALC code.
FIG. 14 is a flow chart of an exemplary function translating non-branching ALC instructions.
FIG. 15 is a flow chart of an exemplary method for writing scanner rules to override ALC instructions.
FIG. 16 is a flow chart of an exemplary method for converting the ALC program into a Control Flow Graph (CFG) representation.
FIGS. 17a and 17b are flow charts of exemplary functions for detecting subroutines for CFG conversion.
FIG. 18 illustrates a flow chart of an exemplary function for determining sub-member nodes.
FIG. 19 illustrates a flow chart of an exemplary function for determining return-destination information of a subroutine.
FIG. 20 illustrates a flow chart of an exemplary function for translating ALC subroutines to TRL.
FIGS. 21 through 27 illustrate the transformation of various subroutines to CFG representations.
FIG. 28 is a flow chart of an exemplary function for transforming subroutines.
FIG. 29 is a flow chart for a fake loop detection algorithm.
FIG. 30 illustrates an exemplary verification method for translated TRL code.
FIG. 31 illustrates an exemplary memory dump file organization.
FIG. 32 illustrates an exemplary five-layer software architecture.
As used herein, the term “ANTLR” or “ANother Tool for Language Recognition” is a program known in the art for reading, processing, executing and translating structured text or binary file subject to rules, referred to as grammar.
As used herein, the term “Assembler Language Code (ALC)” means a low-level programming language for a computer, or other programmable device, in which there is a very strong (generally one-to-one) correspondence between the language and the architecture's machine code instructions. Each assembly language is specific to a particular computer architecture.
As used herein, the term “Analyzer Tool” means a set of functions to analyze a run of ALC and provide information about the code including but not limited to subroutines, self-modified code, and certain patterns.
As used herein, the term “block” or “run” means a section of ALC which has been isolated for processing, which may or may not be functionally related in some manner.
As used herein, the term “Configuration Files” means files containing ALC; in various embodiments, Configuration Files may include Analyzer Tool and SME inputs.
As used herein, the term “Control Flow Graph (CFG)” means a graphical representation of how instructions or function calls of an imperative program are executed or evaluated.
As used herein, the term “CSECT” means is a separate, relocatable block of code and/or data. Subroutines may be separately compiled into CSECTs. A symbol may address the beginning of a CSECT.
As used herein, the term “Data Extraction Tool” means one or more functions which parse and scan the source ALC for lines of code that contain schema information about how data variables are defined and how the data is stored in physical memory.
As used herein, the term “data” includes data values and schema.
As used herein, the term “DSECT” means a section of code which describes the layout of an area of storage without reserving virtual storage for the area that is described. A DSECT layout may reference any area of storage which is addressable by the program. Symbolic names in the DSECT can be used to extract data from the underlying storage area.
As used herein, the term “dump” or “memory dump” means a set of data used for analysis and/or verification, a process in which the contents of memory are displayed and stored.
As used herein, the term “Individual Master File (IMF)” means an ALC application that receives data from multiple sources.
As used herein, the term “Java Data Objects” means objects generated by the Data Extraction Tool which contain data necessary in a runtime environment.
As used herein, the term “Java Object Model (JOM)” refers to an object which contains extracted data structure definitions that can be directly traced back to ALC or another legacy program.
As used herein, the term “Java Runtime Environment” means a software package that contains what is required to run a Java program.
As used herein, the term “legacy language” means ALC or any language specific to a particular operating system which must be translated to an object-oriented programming language or another target language.
As used herein, the term “normalizing” means any process of conforming schema and logic within a programming language to any rule or standard, e.g., in furtherance of translation from one language to another.
As used herein, the term “rule(s) engine” means software to infer consequences or perform functions based on conditions or facts. There are also examples of probabilistic rule engines, including Pei Wang's non-axiomatic reasoning system, and probabilistic logic networks.
As used herein, the term “schema” means a description of the attributes and location of data.
As used herein, the term “self-modifying code” (SMC) means code that alters its own instructions while it is executing, in which the self-modification is intentional.
As used herein, the term “sequential file format” means a set of logical sequential instructions.
US 2018 314,497 A1 - TRANSLATION OF ASSEMBLER LANGUAGE CODE USING INTERMEDIARY TECHNICAL RULES LANGUAGE (TRL)
The present invention is a TRL Engine based validation methodology which also allows validation at any level of granularity required from the application/run level all...
US 2018 253,287 A1 - METHOD FOR TRANSLATION OF ASSEMBLER COMPUTER LANGUAGE TO VALIDATED OBJECT-ORIENTED PROGRAMMING LANGUAGE
The method for translation of assembler computer language to validated object-oriented programming language converts Assembler Language Code (ALC) logical processes to equivalent object-oriented processes. The...