Skip to content

Extract code from Universal Hex files generated by the PXT-based Microsoft MakeCode IDE for micro:bit

License

Notifications You must be signed in to change notification settings

maehw/microbit-pxt-code-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bbc micro:bit PXT code extractor

This project attempts to extract the code from so called Universal Hex files generated by the PXT-based Microsoft MakeCode IDE for micro:bit (web IDE). PXT uses a technique called source embedding in order to add the code as (possibly compressed) text into the 0x0D("Custom Data") records of an Intel HEX file.

The code extractor itself is realized as Python 3 script.

This project is not supporting the extraction of Python code from the BBC micro:bit. To do so, the uBitTool. As soon as the PXT code extractor is in a sufficiently working state, it may be added to the uBitTool - feel free to create a pull request.

Usage

  • Clone this git repository
  • Make sure the Python module dependencies are met: pip3 install intelhex argparse lzma
  • Run the script from a Python 3 environment (should be runnable under Windows, Linux and MacOS):
  usage: extract.py [-h] [file]

  extract.py

  positional arguments:
    file        path to bbc micro:bit HEX input file

  options:
    -h, --help  show this help message and exit

Warning The Python script will automatically create an output folder named after the input file (without extension).

The following files are created by the tool and contain data from intermediate extraction steps:

  1. _code_header.json
  2. _lzma_compressed_text.bin
  3. _packed_code.txt

Example usage and output files:

  $ python extract.py sound-device.hex       
Input file w/o extension: sound-device
           Output folder: /Users/matthias/local_repos/microbit-pxt-code-extractor/sound-device
-------------------------------------------------------------------------
Embedded source dump:
0000  41 14 0E 2F B8 2F A2 BB 9D 00 40 10 00 00 00 00  |A.././....@.....|
0010  7B 22 63 6F 6D 70 72 65 73 73 69 6F 6E 22 3A 22  |{"compression":"|
0020  4C 5A 4D 41 22 2C 22 68 65 61 64 65 72 53 69 7A  |LZMA","headerSiz|
0030  65 22 3A 32 39 34 2C 22 74 65 78 74 53 69 7A 65  |e":294,"textSize|
0040  22 3A 31 37 35 31 30 2C 22 6E 61 6D 65 22 3A 22  |":17510,"name":"|
0050  73 6F 75 6E 64 2D 64 65 76 69 63 65 22 2C 22 65  |sound-device","e|
0060  55 52 4C 22 3A 22 68 74 74 70 73 3A 2F 2F 6D 61  |URL":"https://ma|
0070  6B 65 63 6F 64 65 2E 6D 69 63 72 6F 62 69 74 2E  |kecode.microbit.|
0080  6F 72 67 2F 22 2C 22 65 56 45 52 22 3A 22 35 2E  |org/","eVER":"5.|
0090  30 2E 31 32 22 2C 22 70 78 74 54 61 72 67 65 74  |0.12","pxtTarget|
00A0  22 3A 22 6D 69 63 72 6F 62 69 74 22 7D 5D 00 00  |":"microbit"}]..|
00B0  80 00 95 45 00 00 00 00 00 00 00 3D 88 89 C6 54  |...E.......=...T|
00C0  36 C3 17 4F E4 F9 EC 0D 07 A9 22 3E D4 1C 7C B5  |6..O......">..|.|
00D0  AF A5 88 58 62 DF 18 4A B0 53 1D A2 B3 BA 13 --  |...Xb..J.S..... |
...
-------------------------------------------------------------------------
JSON header length: 0x9D00 (157)
       Text length: 0x40100000 (4160)
          Reserved: 0x0000
-------------------------------------------------------------------------
Embedded JSON header (pretty-printed):
{
    "compression": "LZMA",
    "headerSize": 294,
    "textSize": 17510,
    "name": "sound-device",
    "eURL": "https://makecode.microbit.org/",
    "eVER": "5.0.12",
    "pxtTarget": "microbit"
}
Header size: 294
Text size: 17510
-------------------------------------------------------------------------
Text meta data:
  Length of text before truncation: 4163
   Length of text after truncation: 4160
  Text is LZMA-compressed.
  Writing LZMA compressed output text...
  Decompressing LZMA text...
Writing packed code...
-------------------------------------------------------------------------
Code header dump (pretty-printed)
{
    "name": "sound-device",
    "comment": "",
    "status": "unpublished",
    "cloudId": "pxt/microbit",
    "editor": "blocksprj",
    "targetVersions": {
        "branch": "v5.0.12",
        "tag": "v5.0.12",
        "commits": "https://github.com/microsoft/pxt-microbit/commits/97491d6832cccab6b5bdc05b58e4c6b5dcc18cdd",
        "target": "5.0.12",
        "pxt": "8.0.7"
    }
}
Writing code header JSON file...
-------------------------------------------------------------------------
Code payload analysis (pretty-printed)
  Length: 17519
   Files: ['README.md', 'main.blocks', 'main.ts', 'pxt.json', 'test.ts']
Writing file 'README.md'...
Writing file 'main.blocks'...
Writing file 'main.ts'...
Writing file 'pxt.json'...
Writing file 'test.ts'...

And some details for the example:

$ cd sound-device
matthias@maehcbook sound-device % ll
total 128
drwxr-xr-x  10 matthias  staff    320 31 Aug 22:22 .
drwxr-xr-x  10 matthias  staff    320 31 Aug 22:22 ..
-rw-r--r--   1 matthias  staff   1433 31 Aug 22:22 README.md
-rw-r--r--   1 matthias  staff    314 31 Aug 22:22 _code_header.json
-rw-r--r--   1 matthias  staff   4160 31 Aug 22:22 _lzma_compressed_text.bin
-rw-r--r--   1 matthias  staff  17813 31 Aug 22:22 _packed_code.txt
-rw-r--r--   1 matthias  staff  13402 31 Aug 22:22 main.blocks
-rw-r--r--   1 matthias  staff   1991 31 Aug 22:22 main.ts
-rw-r--r--   1 matthias  staff    537 31 Aug 22:22 pxt.json
-rw-r--r--   1 matthias  staff    129 31 Aug 22:22 test.ts

Contribution

Feel free to make any changes and support this project. ;)