Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the wording of the Rails/PluckInWhere cop #1307

Merged
merged 1 commit into from
Jul 8, 2024

Conversation

padarom
Copy link
Contributor

@padarom padarom commented Jul 4, 2024

Clarifies the existing wording to be easier to understand and adds an additional paragraph about possible performance implications of this cop.

These implications have been mentioned before in #310 (comment), but weren't explicitly mentioned in the cop's documentation.


The linked issue mentions MySQL and I have found the same issue multiple times showing up in MySQL on Stack Overflow, but I ran into it on PostgreSQL today. The query was something along the lines of this, albeit a bit more complicated:

query = %{
  (type = ? AND model_id IN (?)) OR
  (type = ? AND model_id IN (?))
}

Foo.where(query, 'bar', bar_ids,
                 'baz', Baz.where(condition: true).pluck(:id))

The pluck in the last line was replaced with a select, which caused the Baz query to be ran as a subquery instead of eagerly. This resulted in major database load, as Foos table in this case is around 200GB in size and the query was now using sequential scanning, rather than index scanning.

Some research led us to find the related description in the PostgreSQL documentation:

The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result.

This means that queries such as SELECT * FROM foo WHERE model_id IN (subquery) use sequential scanning to match against the model_id, rather than index scanning. Eager loading in this case was much faster, as it allowed an index lookup against the static list provided to the query.

Note to future readers: Rather than using column IN (subquery) you can use column = ANY(ARRAY(subquery)), which causes the subquery to be executed first and allows for an index scan.


Before submitting the PR make sure the following are checked:

  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
  • Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.
  • If this is a new cop, consider making a corresponding update to the Rails Style Guide.

Clarifies the existing wording to be easier to understand and adds an additional
paragraph about possible performance implications of this cop.
@padarom padarom force-pushed the fix-pluck-in-where-documentation branch from 1202c72 to a8cf910 Compare July 5, 2024 06:37
@koic koic merged commit 3e91c03 into rubocop:master Jul 8, 2024
14 checks passed
@koic
Copy link
Member

koic commented Jul 8, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants