Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The Foreign Keys are getting broken down to their component pieces for MongoDB to BigQuery #911

Open
JoshAubrey opened this issue Jul 24, 2023 · 5 comments
Labels
bug Something isn't working needs triage p2 stale

Comments

@JoshAubrey
Copy link

Related Template(s)

MongoDB to BigQuery

What happened?

The Reference Ids (aka Foreign Keys), which are the same format as the primary MongoId / ObjectId / _id, are getting broken down into their individual component pieces.

Input
accountId: 63e131fb63ccf7e561f96920

Output to source_data in BigQuery:
"accountId":{"timestamp":1675702779,"counter":16345376,"randomValue1":6540535,"randomValue2":-6815},"

I figured out a temporary workaround with a GoogleSQL javascript UDF, but handling this in the template library will be much better.
Thank you!

Beam Version

Newer than 2.46.0

Relevant log output

No response

@JoshAubrey JoshAubrey added bug Something isn't working needs triage p2 labels Jul 24, 2023
@bvolpato
Copy link
Contributor

@theshanbhag Can you please take a look / triage this?

@JoshAubrey
Copy link
Author

JoshAubrey commented Aug 24, 2023

Here's an example of what I have been doing and how I worked around it for the moment if this helps:

  1. Reusable function for rebuilding the mongoIds
  2. De-duplicating the data in the table loaded from mongoDB
  3. Transforming the raw JSON data into a table for use in Google / Looker Data Studio using 1. above.
# rebuild MongoIds from their component pieces
CREATE TEMP FUNCTION
  convertId(input JSON)
  RETURNS STRING
  LANGUAGE js AS r"""

function getId(mongoId) {
  if (mongoId === null) {return null}
  
    const result = 
        DecimalHexTwosComplement(mongoId?.timestamp) +
        DecimalHexTwosComplement(mongoId?.randomValue1) +
        DecimalHexTwosComplement(mongoId?.randomValue2) +
        DecimalHexTwosComplement(mongoId?.counter);

    return result;
}

function DecimalHexTwosComplement(decimal) {

  if (decimal >= 0) {
    const hexadecimal = decimal.toString(16);
  
    return hexadecimal;
  } else {
    const hexadecimal = Math.abs(decimal).toString(16);

    let output = '';
    for (i = 0; i < hexadecimal.length; i++) {
      output += (0x0F - parseInt(hexadecimal[i], 16)).toString(16);
    }

    output = (0x01 + parseInt(output, 16)).toString(16);
    return output;
  }
}

return getId(input)

""";

  # Transactions de-deuplicate data
CREATE OR REPLACE TABLE
  `example-data.mongodb.transactions` AS (
  SELECT
    * EXCEPT(row_num)
  FROM (
    SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY id ORDER BY timestamp DESC ) row_num
    FROM
      `example-data.mongodb.transactions`) t
  WHERE
    row_num=1 );

  # Transactions Parse source_data and create transformed for use in data studio
CREATE OR REPLACE TABLE
  `example-data.mongodb.transactions-transformed` AS(
  SELECT
    timestamp,
    id AS id,
    SAFE_CAST(JSON_EXTRACT(source_data, "$.amount") AS DECIMAL) AS amount,
    SAFE_CAST(JSON_EXTRACT(source_data, "$.fee") AS DECIMAL) AS fee,
    SAFE_CAST(JSON_EXTRACT(source_data, "$.status") AS INT) AS status,
    SAFE_CAST(JSON_EXTRACT(source_data, "$.type") AS INT) AS type,
    SAFE_CAST(JSON_EXTRACT(source_data, "$.data.paymentMethod.type") AS INT) AS paymentType,
    convertId(PARSE_JSON(JSON_QUERY(source_data, "$.fid"))) AS accountId,
  FROM
    `example-data.mongodb.transactions`);
   

Copy link

This issue has been marked as stale due to 180 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the issue at any time. Thank you for your contributions.

@github-actions github-actions bot added the stale label May 20, 2024
@JoshAubrey
Copy link
Author

Bump

@Marcello09
Copy link

Although it is not a definitive solution, I've created a UDF that recursively transforms mongo objectid into strings. Here the code for my UDF. Hope it helps...

function parseOid(obj) {
    if (obj instanceof Array) {
        for(var i = 0; i < obj.length; i++) {
            obj[i] = parseOid(obj[i]);
        }
    } else {
        for(var prop in obj) {
           if (prop == "$oid") {
                obj = obj.$oid
                return obj
           }
           if (obj[prop] instanceof Object || obj[prop] instanceof Array) {
                obj[prop] = parseOid(obj[prop])
           }
        }
    }
    return obj
}

function process(inJson) {
    var obj = JSON.parse(inJson)
    obj = parseOid(obj)
    return JSON.stringify(obj)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage p2 stale
Projects
None yet
Development

No branches or pull requests

3 participants