[receiver/oracledb] discrepancy between the metric values in DBA_TABLESPACE_USAGE_METRICS and the ones calculated by the receiver #31451

dloucasfx · 2024-02-27T18:11:36Z

Component(s)

receiver/oracledb

What happened?

Description

The oracledb receiver uses the below [queries] respectively (https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/oracledbreceiver/scraper.go#L41-L42) to set values for oracledb.tablespace_size.limit and oracledb.tablespace_size.usage in bytes

	tablespaceUsageSQL      = "select TABLESPACE_NAME, BYTES from DBA_DATA_FILES"
	tablespaceMaxSpaceSQL   = "select TABLESPACE_NAME, (BLOCK_SIZE*MAX_EXTENTS) AS VALUE FROM DBA_TABLESPACES"

Our backend uses those metrics to calculate the % used tablespace (oracledb.tablespace_size.usage/oracledb.tablespace_size.limit)*100

However, when we look at the values in table DBA_TABLESPACE_USAGE_METRICS https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/DBA_TABLESPACE_USAGE_METRICS.html#GUID-FE479528-BB37-4B55-92CF-9EC19EDF4F46 , the percentage used and the table size/used space, although in data blocks, still do not match the receiver's datapoints when converted.

Steps to Reproduce

enable the oracledb receiver and compare (oracledb.tablespace_size.usage/oracledb.tablespace_size.limit)*100 to USED_PERCENT from DBA_TABLESPACE_USAGE_METRICS

Expected Result

Both values should match

Actual Result

Different values

Collector version

0.93.0

The text was updated successfully, but these errors were encountered:

github-actions · 2024-02-27T18:16:44Z

Pinging code owners:

receiver/oracledb: @dmitryax @crobert-1 @atoulme

See Adding Labels via Comments if you do not have permissions to add labels yourself.

crobert-1 · 2024-02-27T22:41:42Z

It looks like there's at least one problem here, and then I have a clarifying question to see if there's another.

oracledb.tablespace_size.limit = BLOCK_SIZE * MAX_EXTENTS looks wrong. From documentation:

At the finest level of granularity, Oracle stores data in data blocks (also called logical blocks, Oracle blocks, or pages). One data block corresponds to a specific number of bytes of physical database space on disk.

The next level of logical database space is called an extent. An extent is a specific number of contiguous data blocks allocated for storing a specific type of information.

The existing calculation is missing logic to account for how many data blocks are in each extent. We could use INITIAL_EXTENT and NEXT_EXTENT with MAX_EXTENTS together instead of BLOCK_SIZE, but I'm not sure if this is a definitive way to calculate this value.
2. While using oracledb.tablespace_size.usage in a calculation, do you know if its datapoints are summed together within the metric for this calculation? ~~If not, it would be the wrong usage value as it needs to be the sum of every data file's used bytes within the tablespace, in case there are multiple data files within the tablespace.~~ The metric has a datapoint for each tablespace, with the tablespace_name attribute.

We could also expose USED_PERCENT for each tablespace name if that's preferable, so the user can just opt in instead of doing their own backend calculation. It should be a pretty simple enhancement to just add another metric. However, we still need to fix incorrect metrics, either way.

crobert-1 · 2024-03-27T22:31:10Z

Background investigation

Doing some local testing with DBA_TABLESPACE_USAGE_METRICS, here are some interesting results using the SYSTEM tablespace.

DBA_TABLESPACE_USAGE_METRICS: used size: 109880, tablespace size: 3882269, used percent: 2.83030361883733455873356534542042295369. Note: Values for size are in blocks

Block size is 8192 bytes for each tablespace in my instance.

Findings on oracledb.tablespace_size.limit

This is the metric data for oracldb.tablespace_size.limit for the SYSTEM tablespace.

Metric #19
Descriptor:
     -> Name: oracledb.tablespace_size.limit
     -> Description: Maximum size of tablespace in bytes, -1 if unlimited.
     -> Unit: By
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> tablespace_name: Str(SYSTEM)
StartTimestamp: 2024-03-27 21:21:11.040735 +0000 UTC
Timestamp: 2024-03-27 21:21:12.103002 +0000 UTC
Value: 17592186019840

This value is in bytes, but converted to blocks it's 2147483645 blocks.

This means the metric we're calculating and sending as the limit is 553 times larger than what's being reported by DBA_TABLESPACE_USAGE_METRICS.

Findings on oracledb.tablespace_size.usage:

Metric #20
Descriptor:
     -> Name: oracledb.tablespace_size.usage
     -> Description: Used tablespace in bytes.
     -> Unit: By
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> tablespace_name: Str(SYSTEM)
StartTimestamp: 2024-03-27 21:21:11.040735 +0000 UTC
Timestamp: 2024-03-27 21:21:12.09212 +0000 UTC
Value: 917504000

This value is in bytes, so converting it to blocks we get the following:

917504000 / 8192 = 112000 blocks

The DBA_TABLESPACE_USAGE_METRICS reported value is 109880, so this is relatively close, IMO.

Proposed solution

I think the best solution here would be to rely on DBA_TABLESPACE_USAGE_METRICS as the definitive source for usage variables, then use BLOCK_SIZE from DBA_TABLESPACES to convert block counts to bytes for these metrics. This will result in a slight loss of granularity (as bytes will always be multiples of 8192) but I think this is acceptable. I think this is preferential over our current solution as queries needing to account for Oracle DB's inner architecture that changes based on instance doesn't seem like a stable and correct long term approach (as shown by this issue).

crobert-1 · 2024-03-27T22:39:36Z

I realize that this may just be a discrepancy between "maximum possible size of tablespace" and "tablespace size" to change how we're calculating oracledb.tablespace_size.limit, but I think the proposed alternative is a more useable value.

) **Description:** Values were being scraped incorrectly for the metrics `oracledb.tablespace_size.limit` and `oracledb.tablespace_size.usage`. The changes these metrics to be scraped from the [`DBA_TABLESPACE_USAGE_METRICS`](https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/DBA_TABLESPACE_USAGE_METRICS.html#GUID-FE479528-BB37-4B55-92CF-9EC19EDF4F46) table. This results in a slight loss of granularity in these metrics, as values will always be in multiples of the respective tablespace's block size, but I think the clarity and simplicity is worth the trade off. Note: The value of the usage metric was generally close to the expected value, but the limit was being calculated as potential theoretical capacity, unbound by server capacity. For example, in testing in a docker container on my local machine, limit was set to **17TB**. This doesn't line up with user expectations. **Link to tracking Issue:** Fixes #31451 **Testing:** Updated existing tests, added a couple new ones. Also, the original issue filed was comparing `DBA_TABLESPACE_USAGE_METRICS` output for percent used to what we got from `usage/limit * 100`. Here's the local testing outputs compared to show they now line up. ``` 2024-03-27T16:31:57.938-0700 info oracledbreceiver/scraper.go:285 DBA_TABLESPACE_USAGE_METRICS: Tablespace name: SYSTEM, used space: 111288, tablespace size: 3518587, percent used: 3.16286054600895188892586711654422641816 {"kind": "receiver", "name": "oracledb", "data_type": "metrics"} ``` ``` Metric #20 Descriptor: -> Name: oracledb.tablespace_size.usage -> Description: Used tablespace in bytes. -> Unit: By -> DataType: Gauge NumberDataPoints #0 Data point attributes: -> tablespace_name: Str(SYSTEM) StartTimestamp: 2024-03-27 23:31:56.873576 +0000 UTC Timestamp: 2024-03-27 23:32:12.523295 +0000 UTC Value: 911671296 ``` ``` Metric #19 Descriptor: -> Name: oracledb.tablespace_size.limit -> Description: Maximum size of tablespace in bytes, -1 if unlimited. -> Unit: By -> DataType: Gauge NumberDataPoints #0 Data point attributes: -> tablespace_name: Str(SYSTEM) StartTimestamp: 2024-03-27 23:31:56.873576 +0000 UTC Timestamp: 2024-03-27 23:32:12.523295 +0000 UTC Value: 28824264704 ``` Doing the same calculation, we get: ``` (911671296 / 28824264704) * 100 = ~3.16% ```

…n-telemetry#32028) **Description:** Values were being scraped incorrectly for the metrics `oracledb.tablespace_size.limit` and `oracledb.tablespace_size.usage`. The changes these metrics to be scraped from the [`DBA_TABLESPACE_USAGE_METRICS`](https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/DBA_TABLESPACE_USAGE_METRICS.html#GUID-FE479528-BB37-4B55-92CF-9EC19EDF4F46) table. This results in a slight loss of granularity in these metrics, as values will always be in multiples of the respective tablespace's block size, but I think the clarity and simplicity is worth the trade off. Note: The value of the usage metric was generally close to the expected value, but the limit was being calculated as potential theoretical capacity, unbound by server capacity. For example, in testing in a docker container on my local machine, limit was set to **17TB**. This doesn't line up with user expectations. **Link to tracking Issue:** Fixes open-telemetry#31451 **Testing:** Updated existing tests, added a couple new ones. Also, the original issue filed was comparing `DBA_TABLESPACE_USAGE_METRICS` output for percent used to what we got from `usage/limit * 100`. Here's the local testing outputs compared to show they now line up. ``` 2024-03-27T16:31:57.938-0700 info oracledbreceiver/scraper.go:285 DBA_TABLESPACE_USAGE_METRICS: Tablespace name: SYSTEM, used space: 111288, tablespace size: 3518587, percent used: 3.16286054600895188892586711654422641816 {"kind": "receiver", "name": "oracledb", "data_type": "metrics"} ``` ``` Metric open-telemetry#20 Descriptor: -> Name: oracledb.tablespace_size.usage -> Description: Used tablespace in bytes. -> Unit: By -> DataType: Gauge NumberDataPoints #0 Data point attributes: -> tablespace_name: Str(SYSTEM) StartTimestamp: 2024-03-27 23:31:56.873576 +0000 UTC Timestamp: 2024-03-27 23:32:12.523295 +0000 UTC Value: 911671296 ``` ``` Metric open-telemetry#19 Descriptor: -> Name: oracledb.tablespace_size.limit -> Description: Maximum size of tablespace in bytes, -1 if unlimited. -> Unit: By -> DataType: Gauge NumberDataPoints #0 Data point attributes: -> tablespace_name: Str(SYSTEM) StartTimestamp: 2024-03-27 23:31:56.873576 +0000 UTC Timestamp: 2024-03-27 23:32:12.523295 +0000 UTC Value: 28824264704 ``` Doing the same calculation, we get: ``` (911671296 / 28824264704) * 100 = ~3.16% ```

dloucasfx added bug Something isn't working needs triage New item requiring triage labels Feb 27, 2024

github-actions bot added the receiver/oracledb label Feb 27, 2024

crobert-1 removed the needs triage New item requiring triage label Feb 27, 2024

This was referenced Mar 5, 2024

Weekly Report: 2024-02-27 - 2024-03-05 #31560

Closed

Weekly Report: 2024-02-27 - 2024-03-05 asuresh4/opentelemetry-collector-contrib#11543

Open

crobert-1 mentioned this issue Mar 28, 2024

[receiver/oracledb] Fix incorrect values for a couple of metrics #32028

Merged

dmitryax closed this as completed in #32028 Mar 28, 2024

shmsr mentioned this issue Jun 6, 2024

[oracle] Missing total.bytes field in tablespace datastream elastic/beats#39787

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/oracledb] discrepancy between the metric values in DBA_TABLESPACE_USAGE_METRICS and the ones calculated by the receiver #31451

[receiver/oracledb] discrepancy between the metric values in DBA_TABLESPACE_USAGE_METRICS and the ones calculated by the receiver #31451

dloucasfx commented Feb 27, 2024 •

edited

Loading

github-actions bot commented Feb 27, 2024

crobert-1 commented Feb 27, 2024 •

edited

Loading

crobert-1 commented Mar 27, 2024

crobert-1 commented Mar 27, 2024

[receiver/oracledb] discrepancy between the metric values in DBA_TABLESPACE_USAGE_METRICS and the ones calculated by the receiver #31451

[receiver/oracledb] discrepancy between the metric values in DBA_TABLESPACE_USAGE_METRICS and the ones calculated by the receiver #31451

Comments

dloucasfx commented Feb 27, 2024 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

github-actions bot commented Feb 27, 2024

crobert-1 commented Feb 27, 2024 • edited Loading

crobert-1 commented Mar 27, 2024

Background investigation

Proposed solution

crobert-1 commented Mar 27, 2024

dloucasfx commented Feb 27, 2024 •

edited

Loading

crobert-1 commented Feb 27, 2024 •

edited

Loading