Suman Awal December 13, 2024

Hash and Array Lookup: Using Them Efficiently

Hashes and arrays are widely used data structures in Ruby. When working with data, you'll frequently encounter situations where you need to search data through hashes and arrays. In this blog, we'll discuss CSV uploads and data updated based on the CSV document along with the efficient use case of Hashes and Arrays.

Hashes

Hashes in Ruby are collections of key-value pairs, and they are designed for quick lookups using keys.

Example:


hash = { name: 'Joe', age: 50, position: 'CTO' }

# Retrieve value based on key:
puts hash[:age] 

# Output
50


Arrays

Arrays are ordered collections of the data.

Example:



array = [1,2,3]

# Retrieve value based on position
puts array[1]

# Output
2


Above you can see the hash and array and the simple use case. Now let's discuss two use cases to understand the Hash search and Array Search.

Use Case I:

Let's assume you have an Item model with the following fields:

  • name
  • sku (unique)
  • price
  • supplier

You need to create or update each record in the database based on the CSV data using sku. You need to follow the following steps:

  • Import the CSV data.
  • Find the item for sku.
  • If item is present then update the price.
  • If item does not exists then create a record.

Here is a code snippet to perform the above action.


# Example csv content
# name,sku,price,supplier
# Nike Star,nk-1,100,Nike

require 'csv'
csv_path = 'items.csv' # full path of the csv file
items = CSV.read(csv_path, headers: true).map(&:to_h)

# This will assign csv data to items in following format.
# items = [{name: 'Nike Star', sku: 'nk-1', price: 100, supplier: 'Nike' }, ...]

# Let's find or create item and update if necessary

items.each do |item|
sku = item[:sku]
record = Item.find_or_initialize_by(sku: sku)
record.assign_attributes(
name: item[:name],
supplier: item[:supplier],
price: item[:price]
)
record.save
end

# This will create or update the records based on the sku.

In above example, you used an array to iterate through each record and then find or create the record and finally update other attributes.

Use Case II:

Let's assume you have a Item model with the following fields:

  • name
  • sku (unique)
  • price
  • stale (boolean)

You will be provided with the CSV from the third party. Now, instead of updating/creating each record in CSV, you are supposed to update the records marked as stale in the application. With the above scenario, you can act in two ways. One approach is following the similar process as described in above example with minor modification. Here is a code snippet.



# Assign items as in above example

items.each do |item|
record = Record.find_by(sku: item[:sku])

next unless record&.stale

record.update(price: item[:price])
end


Explanation: You can find that to update the record you need to iterate through all the data and check conditions to update the record. Now let's discuss another approach. In this approach, we will convert the CSV data to key value pair (hash).



csv_path = 'items.csv' # Full path of the csv

csv_data_hash = {}

CSV.foreach(csv_path, headers: true) do |row|
csv_data_hash[row[:sku]] = row.to_h
end

# This will create a csv_data_hash with following structure
# { 'nk-1' => { name: 'Nike Star', sku: 'nk-1', price: 100 }, ... }

Next step, find all the stale records from the database that need to be updated.

 
update_required_items = Item.where(stale: true)

Next step, iterate through each update_required_items and find the data from the hash in the CSV and finally update it.

 

update_required_items.each do |item|
csv_item = csv_data_hash[item.sku]
next unless csv_item

item.update(price: csv_item[:price], stale: false)
end


In this way we can efficiently use hash lookup to update the record.

Conclusion

With the above use cases, you can conclude that if you have a large dataset and need to iterate through each row and perform action then array lookup is an optimal solution but if you have a large dataset but need to search for the small set of rows then transforming CSV data into a hash and performing a search would be an optimal and efficient solution.

Here is the comparision between the Hash Lookup and Array Lookup.

Feature Using Hash Using Array
Lookup Time O(1) per SKU O(n) per SKU
Space Complexity Higher (stores hash keys) Lower
Suitability for Large Data Best for large datasets Less efficient
Implementation Complexity Moderate Simple

 

Ps. if you have any questions

Ask here