🏛️ Bundle 2024-01
Object equality

Object equality

The library "dessia_common" offers various concepts of equality between objects inheriting from "DessiaObject" or "PhysicalObject". For example, if we have two instances "ball1" and "ball2" of the class "Ball" and we want to test the equality between the two objects, different approaches are possible. We can test for equality based on the object data: we compare the constructor attributes of the two objects with each other, and the two objects are declared equal if the attributes are equal. Alternatively, we can test for equality based on the memory address of the objects, where two objects will be equal if they have the same memory address.

The generic objects of "dessia_common" offer a class attribute "_eq_is_data_eq" which transparently manages this equality behavior. If we choose "_eq_is_data_eq=True", then when we use the "==" operator between two objects, it will compare based on data. Otherwise, the memory address will be taken into account.

If we want to use equality based on data but exclude certain attributes, it is possible to use the class attribute "_non_data_eq_attributes" and define the list of attributes to exclude. For example, for the class "Bearing", if we want to exclude the "check" and "name" attributes from the equality calculation, we must structure the object as follows:

class Bearing(PhysicalObject):
		_eq_is_data_eq = True
		_non_data_eq_attributes = ['check', 'name']
 
		def __init__(self, balls: List[float],
		             positions: Tuple[float, float, float],
		             check: bool = True,
		             diameter: float = 0.01,
		             name: str = ""):
				self.balls = balls
				self.positions = positions
				self.check = check
				self.diameter = diameter
				PhysicalObject.__init__(self, name=name)

It is also possible to rewrite the equality operator between two objects by defining the "_data_eq" method and following the following syntax:

class Ball(DessiaObject):
		_standalone_in_db = False
 
		def __init__(self, diameter: float,
		             name: str = ""):
				self.diameter = diameter
				DessiaObject.__init__(self, name=name)
 
		def _data_eq(self, other_object):
				eq_ = True
				if other_object.__class__.__name__ != self.__class__.__name__:
						return False
				if self.diameter != other_object.diameter:
						return False
				return eq_

In Python, when using the "==" operator between two objects, a preliminary operation based on the hash of the objects is performed. Using the example of "ball1" and "ball2", two instances of the "Ball" class: when checking for equality between these two objects, Python first computes the hash of the two objects. If the hashes are different, then the objects are not equal and the "_data_eq" method is not executed. If the hashes are the same, then the "_data_eq" method is executed to check for equality between the two objects. This approach speeds up equality testing between objects by using a hash computation that is much faster than the computation of "_data_eq".

By default, "dessia_common" offers a hash computation that takes into account all of the object's attributes. The computation must be simple in order to be fast. Personalization is possible through the class attribute "_non_data_hash_attributes", where, in the same way as for equality, we can list the attributes that we want to exclude from the hash computation.

If the user wishes, it is also possible to rewrite the hash computation using the "__hash__" method. The following example provides a specific hash computation for the "Bearing" object: "

class Bearing(PhysicalObject):
		_eq_is_data_eq = True
		_non_data_eq_attributes = ['check', 'name']
 
		def __init__(self, balls: List[float],
		             positions: Tuple[float, float, float],
		             check: bool = True,
		             diameter: float = 0.01,
		             name: str = ""):
				self.balls = balls
				self.positions = positions
				self.check = check
				self.diameter = diameter
				PhysicalObject.__init__(self, name=name)
 
		def __hash__(self):
				_hash = hash(self.diameter)
				for ball in self.balls:
						_hash += hash(ball)
				return _hash

Choosing the right "hash" is important because if they are all equal for an object (for example, for Bearing objects), then when searching for an instance in the database, we will quickly conclude that all Bearing objects in the database have the same "hash" and will therefore calculate the "_data_eq" method for all objects pairwise. All of these operations can significantly slow down the platform's performance.

To ensure the robustness of the object "hashes" present on the platform, it is possible to have a global view of the consistency of object "hashes" from the admin/hash-warnings tab (see documentation on platform usage). Here is a screenshot of this tab, and we can see that the "Car" object has poor hash diversity while the "Dataset" object has good hash diversity.

Hash warnings