Data mapping describes the process of defining Mapper objects, which associate table metadata with user-defined classes.
The Mapper
's role is to perform SQL operations upon the database, associating individual table rows with instances of those classes, and individual database columns with properties upon those instances, to transparently associate in-memory objects with a persistent database representation.
When a Mapper
is created to associate a Table
object with a class, all of the columns defined in the Table
object are associated with the class via property accessors, which add overriding functionality to the normal process of setting and getting object attributes. These property accessors keep track of changes to object attributes; these changes will be stored to the database when the application "flushes" the current state of objects (known as a Unit of Work).
Two objects provide the primary interface for interacting with Mappers and the "unit of work", which are the Query
object and the Session
object. Query
deals with selecting objects from the database, whereas Session
provides a context for loaded objects and the ability to communicate changes on those objects back to the database.
The primary method on Query
for loading objects is its select()
method, which has similar arguments to a sqlalchemy.sql.Select
object. But this select method executes automatically and returns results, instead of awaiting an execute() call. Instead of returning a cursor-like object, it returns an array of objects.
The three configurational elements to be defined, i.e. the Table
metadata, the user-defined class, and the Mapper
, are typically defined as module-level variables, and may be defined in any fashion suitable to the application, with the only requirement being that the class and table metadata are described before the mapper. For the sake of example, we will be defining these elements close together, but this should not be construed as a requirement; since SQLAlchemy is not a framework, those decisions are left to the developer or an external framework.
Also, keep in mind that the examples in this section deal with explicit Session
objects mapped directly to Engine
objects, which represents the most explicit style of using the ORM. Options exist for how this is configured, including binding Table
objects directly to Engines
(described in Binding MetaData to an Engine), as well as using an auto-generating "contextual" session via the SessionContext plugin (described in SessionContext).
Starting with a Table
definition and a minimal class construct, the two are associated with each other via the mapper()
function [api], which generates an object called a Mapper
. SA associates the class and all instances of that class with this particular Mapper
, which is then stored in a global registry.
from sqlalchemy import * # metadata meta = MetaData() # table object users_table = Table('users', meta, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('fullname', String(100)), Column('password', String(20)) ) # class definition class User(object): pass # create a mapper and associate it with the User class. mapper(User, users_table)
Thats all for configuration. Next, we will create an Engine
and bind it to a Session
, which represents a local collection of mapped objects to be operated upon.
# engine engine = create_engine("sqlite://mydb.db") # session session = create_session(bind_to=engine)
The session
represents a "workspace" which can load objects and persist changes to the database. A Session
[doc] [api] is best created as local to a particular set of related data operations, such as scoped within a function call, or within a single application request cycle. Next we illustrate a rudimental query which will load a single object instance. We will modify one of its attributes and persist the change back to the database.
# select sqluser = session.query(User).selectfirst_by(user_name='fred')
# modify user.user_name = 'fred jones' # flush - saves everything that changed # within the scope of our Session sqlsession.flush()
Things to note from the above include that the loaded User
object has an attribute named user_name
on it, which corresponds to the user_name
column in the users_table
; this attribute was configured at the class level when the Mapper
which we created first compiled itself. Our modify operation on this attribute caused the object to be marked as dirty
, which was picked up automatically within the subsequent flush()
process. The flush()
is the point at which all changes to objects within the Session
are persisted to the database, and the User
object is no longer marked as dirty
until it is again modified.
The method session.query(class_or_mapper)
returns a Query
object [api]. Query
implements a large set of methods which are used to produce and execute select statements tailored for loading object instances. It returns objects in most cases either as lists of objects or as single instances, depending on the type of query issued.
A Query
is always initially generated starting with the Session
that we're working with, and is always created relative to a particular class, which is the primary class we wish to load.
# get a query from a Session based on class: query = session.query(User)
Alternatively, an actual Mapper
instance can be specified instead of a class:
# locate the mapper corresponding to the User class usermapper = class_mapper(User) # create query for the User query = session.query(usermapper)
Dealing with mappers explicitly as above is usually not needed except for more advanced patterns where a class may have multiple mappers associated with it.
Once we have a query, we can start loading objects. The two most rudimental and general purpose methods are select()
and select_by()
. select()
is oriented towards query criterion constructed as a ClauseElement
object, which is the kind of object generated when constructing SQL expressions as described in the SQL section. select_by()
can also accomodate ClauseElement
objects but is generally more oriented towards keyword arguments which correspond to mapped attribute names. In both cases, the criterion specified is used to construct the WHERE
criterion of the generated SQL. The Query
object will use this criterion to compile the full SQL query issued to the database, combining the WHERE
condition with the appropriate column selection clauses and FROM
criterion, incuding whatever modifications are required to control ordering, number of rows returned, joins for loading related objects, etc.
The general form of select_by()
is:
def select_by(self, *clause_elements, **keyword_criterion)
Where *clause_elements
is a set of zero or more ClauseElement
objects, and **keyword_criterion
are key/value pairs each of which correspond to a simple equality comparison. The full set of clause elements and key/value pairs are joined together in the resulting SQL statement via AND
.
# using select_by with keyword arguments sqlresult = query.select_by(name='john', fullname='John Smith')
# using select_by with preceding ClauseElements followed by keywords result = query.select_by(users_table.c.user_id>224, sql users_table.c.user_name=='john', fullname='John Smith')
Note that a ClauseElement
is generated by each boolean operator (i.e. ==
, >
) that's performed against a Column
object - recall that this operator is overloaded to return a binary clause construct. But the fullname="John Smith"
argument is not using any kind of overloading, and is a regular Python keyword argument. Additionally, while ClauseElements
are constructed against the Column
elements configured on the mapped Table
, the keyword-based criterion is constructed against the class-level attribute names which were configured by the mapper. While they are the same name as their columns in this particular example, they can be configured to have names distinct from their columns. So it follows that using ClauseElements
for criterion are closer to the relational side of things, and using keyword arguments are closer towards the domain object side of things.
The select()
method, unlike the select_by()
method, is purely ClauseElement
/relationally oriented and has no domain-level awareness. Its basic argument signature:
def select(self, clause_element, **additional_options)
The clause_element
argument again represents a ClauseElement
which corresponds to the WHERE
criterion of a SELECT
statement. Here, there is only a single ClauseElement
and it will be used to form the entire set of criterion. A basic select()
operation looks like this:
result = query.select(users_table.c.user_name=='john')
To generate AND
criterion the way select_by()
does, you use the and_()
construct from the sql construction system, which generates just another ClauseElement
containing two sub-ClauseElement
s:
result = query.select(and_(users_table.c.user_name=='john', users_table.c.fullname=='John Smith'))
From this, one can see that select()
is stricter in its operation and does not make any kind of assumptions about how to join multiple criterion together. Of course in all cases where ClauseElement
is used, any expression can be created using combinations of and_()
, or_()
, not_()
, etc.
result = query.select( or_(users_table.c.user_name == 'john', users_table.c.user_name=='fred') sql)
The keyword portion of select()
is used to indicate further modifications to the generated SQL, including common arguments like ordering, limits and offsets:
result = query.select(users_table.c.user_name=='john', sql order_by=[users_table.c.fullname], limit=10, offset=12)
select()
also provides an additional calling style, which is to pass a fully constructed Select
construct to it.
# using a full select object sqlresult = query.select(users_table.select(users_table.c.user_name=='john'))
When a full select is passed, the Query
object does not generate its own SQL. Instead, the select construct passed in is used without modification, except that its use_labels
flag is switched on to prevent column name collisions. Therefore its expected that the construct will include the proper column clause as appropriate to the mapped class being loaded.
The techniques used when querying with a full select construct are similar to another query method called instances()
, which is described in Result-Set Mapping.
The twin calling styles presented by select()
and select_by()
are mirrored in several other methods on Query
. For each method that indicates a verb such as select()
and accepts a single ClauseElement
followed by keyword-based options, the _by()
version accepts a list of ClauseElement
objects followed by keyword-based argument criterion. These include:
selectfirst()
/ selectfirst_by()
- select with LIMIT 1 and return a single object instance
selectone()
/ selectone_by()
- select with LIMIT 2, assert that only one row is present, and return a single object instance
filter()
/ filter_by()
- apply the criterion to the Query
object generatively, and return a new Query
object with the criterion built in.
count()
/ count_by()
- return the total number of object instances that would be returned.
The get()
method loads a single instance, given the primary key value of the desired entity:
# load user with primary key 15 user = query.get(15)
The get()
method, because it has the actual primary key value of the instance, can return an already-loaded instance from the Session
without performing any SQL. It is the only method on Query
that does not issue SQL to the database in all cases.
To issue a composite primary key to get()
, use a tuple. The order of the arguments matches that of the table meta data.
myobj = query.get((27, 3, 'receipts'))
Another special method on Query
is load()
. This method has the same signature as get()
, except it always refreshes the returned instance with the latest data from the database. This is in fact a unique behavior, since as we will see in the Session / Unit of Work chapter, the Query
usually does not refresh the contents of instances which are already present in the session.
Some of the above examples above illustrate the usage of the mapper's Table object to provide the columns for a WHERE Clause. These columns are also accessible off of the mapped class directly. When a mapper is assigned to a class, it also attaches a special property accessor c
to the class itself, which can be used just like the table metadata to access the columns of the table:
userlist = session.query(User).select(User.c.user_id==12)
The filter()
and filter_by()
methods on Query
are known as "generative" methods, in that instead of returning results, they return a newly constructed Query
object which contains the "filter" criterion as part of its internal state. This is another method of stringing together criterion via AND
clauses. But when using the Query
generatively, it becomes a much more flexible object, as there are generative methods for adding all kinds of criterion and modifiers, some of which are relationally oriented and others which are domain oriented. The key behavior of a "generative" method is that calling it produces a new Query
object, which contains all the attributes of the previous Query
object plus some new modifications.
# create a Query query = session.query(User) # filter by user_name property using filter_by() query = query.filter_by(user_name="john") # filter by email address column using filter() query = query.filter(User.c.fullname=="John Smith") # execute - the list() method returns a list. this is equivalent to just saying query.select() sqlresult = query.list()
Other generative behavior includes list-based indexing and slice operations, which translate into LIMIT
and OFFSET
criterion:
sqlsession.query(User).filter(user_table.c.fullname.like('j%'))[20:30]
Iterable behavior:
for user in session.query(User).filter_by(user_name='john'): # etc.
Applying modifiers generatively:
sqlsession.query(User).filter(user_table.c.fullname.like('j%')). limit(10).offset(20).order_by(user_table.c.user_id).list()
There is no restriction to mixing the "generative" methods like filter()
, join()
, order_by()
etc. with the "non-generative" methods like select()
and select_by()
. The only difference is that select...
methods return results immediately, whereas generative methods return a new Query
from which results can be retrieved using any number of methods such as list()
, select()
, select_by()
, selectfirst()
, etc. Options that are specified to the select...
methods build upon options that were already built up generatively.
The Query
object will be described further in subsequent sections dealing with joins. Also, be sure to consult the full generated documentation for Query
: Query.
When objects corresponding to mapped classes are created or manipulated, all changes are logged by the Session
object. The changes are then written to the database when an application calls flush()
. This pattern is known as a Unit of Work, and has many advantages over saving individual objects or attributes on those objects with individual method invocations. Domain models can be built with far greater complexity with no concern over the order of saves and deletes, excessive database round-trips and write operations, or deadlocking issues. The flush()
operation batches its SQL statements into a transaction, and can also perform optimistic concurrency checks (using a version id column) to ensure the proper number of rows were in fact affected.
The Unit of Work is a powerful tool, and has some important concepts that should be understood in order to use it effectively. See the Session / Unit of Work section for a full description on all its operations.
When a mapper is created, the target class has its mapped properties decorated by specialized property accessors that track changes. New objects by default must be explicitly added to the Session
using the save()
method:
mapper(User, users_table) # create a new User myuser = User() myuser.user_name = 'jane' myuser.password = 'hello123' # create another new User myuser2 = User() myuser2.user_name = 'ed' myuser2.password = 'lalalala' # create a Session and save them sess = create_session() sess.save(myuser) sess.save(myuser2) # load a third User from the database sqlmyuser3 = sess.query(User).select(User.c.user_name=='fred')[0]
myuser3.user_name = 'fredjones' # save all changes sqlsession.flush()
The requirement that new instances be explicitly stored in the Session
via save()
operation can be modified by using the SessionContext extension module.
The mapped class can also specify whatever methods and/or constructor it wants:
class User(object): def __init__(self, user_name, password): self.user_id = None self.user_name = user_name self.password = password def get_name(self): return self.user_name def __repr__(self): return "User id %s name %s password %s" % (repr(self.user_id), repr(self.user_name), repr(self.password)) mapper(User, users_table) sess = create_session() u = User('john', 'foo') sess.save(u) sqlsession.flush()
>>> u User id 1 name 'john' password 'foo'
Note that the __init__() method is not called when the instance is loaded. This is so that classes can define operations that are specific to their initial construction which are not re-called when the object is restored from the database, and is similar in concept to how Python's pickle
module calls __new__()
when deserializing instances. To allow __init__()
to be called at object load time, or to define any other sort of on-load operation, create a MapperExtension
which supplies the create_instance()
method (see Extending Mapper, as well as the example in the FAQ).
SQLAlchemy will only place columns into UPDATE statements for which the value of the attribute has changed. This is to conserve database traffic and also to successfully interact with a "deferred" attribute, which is a mapped object attribute against the mapper's primary table that isnt loaded until referenced by the application.
back to section topSo that covers how to map the columns in a table to an object, how to load objects, create new ones, and save changes. The next step is how to define an object's relationships to other database-persisted objects. This is done via the relation
function [doc][api] provided by the orm
module.
With our User class, lets also define the User has having one or more mailing addresses. First, the table metadata:
from sqlalchemy import * metadata = MetaData() # define user table users_table = Table('users', metadata, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('password', String(20)) ) # define user address table addresses_table = Table('addresses', metadata, Column('address_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) )
Of importance here is the addresses table's definition of a foreign key relationship to the users table, relating the user_id column into a parent-child relationship. When a Mapper
wants to indicate a relation of one object to another, the ForeignKey
relationships are the default method by which the relationship is determined (options also exist to describe the relationships explicitly).
So then lets define two classes, the familiar User
class, as well as an Address
class:
class User(object): def __init__(self, user_name, password): self.user_name = user_name self.password = password class Address(object): def __init__(self, street, city, state, zip): self.street = street self.city = city self.state = state self.zip = zip
And then a Mapper
that will define a relationship of the User
and the Address
classes to each other as well as their table metadata. We will add an additional mapper keyword argument properties
which is a dictionary relating the names of class attributes to database relationships, in this case a relation
object against a newly defined mapper for the Address class:
mapper(Address, addresses_table) mapper(User, users_table, properties = { 'addresses' : relation(Address) } )
Lets do some operations with these classes and see what happens:
engine = create_engine('sqlite:///mydb.db') metadata.create_all(engine) session = create_session(bind_to=engine) u = User('jane', 'hihilala') u.addresses.append(Address('123 anywhere street', 'big city', 'UT', '76543')) u.addresses.append(Address('1 Park Place', 'some other city', 'OK', '83923')) session.save(u) session.flush()
A lot just happened there! The Mapper
figured out how to relate rows in the addresses table to the users table, and also upon flush had to determine the proper order in which to insert rows. After the insert, all the User
and Address
objects have their new primary and foreign key attributes populated.
Also notice that when we created a Mapper
on the User
class which defined an addresses
relation, the newly created User
instance magically had an "addresses" attribute which behaved like a list. This list is in reality a Python property
which will return an instance of sqlalchemy.orm.attributes.InstrumentedList
. This is a generic collection-bearing object which can represent lists, sets, dictionaries, or any user-defined collection class which has an append()
method. By default it represents a list:
del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) session.flush()
Note that when creating a relation with the relation()
function, the target can either be a class, in which case the primary mapper for that class is used as the target, or a Mapper
instance itself, as returned by the mapper()
function.
In the previous example, a single address was removed from the addresses
attribute of a User
object, resulting in the corresponding database row being updated to have a user_id of None
. But now, theres a mailing address with no user_id floating around in the database of no use to anyone. How can we avoid this ? This is acheived by using the cascade
parameter of relation
:
clear_mappers() # clear mappers from the previous example mapper(Address, addresses_table) mapper(User, users_table, properties = { 'addresses' : relation(Address, cascade="all, delete-orphan") } ) del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) session.flush()
In this case, with the delete-orphan
cascade rule set, the element that was removed from the addresses list was also removed from the database. Specifying cascade="all, delete-orphan"
means that every persistence operation performed on the parent object will be cascaded to the child object or objects handled by the relation, and additionally that each child object cannot exist without being attached to a parent. Such a relationship indicates that the lifecycle of the Address
objects are bounded by that of their parent User
object.
Cascading is described fully in Cascade rules.
back to section topBy creating relations with the backref
keyword, a bi-directional relationship can be created which will keep both ends of the relationship updated automatically, independently of database operations. Below, the User
mapper is created with an addresses
property, and the corresponding Address
mapper receives a "backreference" to the User
object via the property name user
:
Address = mapper(Address, addresses_table) User = mapper(User, users_table, properties = { 'addresses' : relation(Address, backref='user') } ) u = User('fred', 'hi') a1 = Address('123 anywhere street', 'big city', 'UT', '76543') a2 = Address('1 Park Place', 'some other city', 'OK', '83923') # append a1 to u u.addresses.append(a1) # attach u to a2 a2.user = u # the bi-directional relation is maintained >>> u.addresses == [a1, a2] True >>> a1.user is user and a2.user is user True
The backreference feature also works with many-to-many relationships, which are described later. When creating a backreference, a corresponding property (i.e. a second relation()
) is placed on the child mapper. The default arguments to this property can be overridden using the backref()
function:
mapper(User, users_table) mapper(Address, addresses_table, properties={ 'user':relation(User, backref=backref('addresses', cascade="all, delete-orphan")) })
The backref()
function is often used to set up a bi-directional one-to-one relationship. This is because the relation()
function by default creates a "one-to-many" relationship when presented with a primary key/foreign key relationship, but the backref()
function can redefine the uselist
property to make it a scalar:
mapper(User, users_table) mapper(Address, addresses_table, properties={ 'user' : relation(User, backref=backref('address', uselist=False)) })
We've seen how the relation
specifier affects the saving of an object and its child items, how does it affect selecting them? By default, the relation keyword indicates that the related property should be attached a lazy loader when instances of the parent object are loaded from the database; this is just a callable function that when accessed will invoke a second SQL query to load the child objects of the parent.
# define a mapper mapper(User, users_table, properties = { 'addresses' : relation(mapper(Address, addresses_table)) }) # select users where username is 'jane', get the first element of the list # this will incur a load operation for the parent table sqluser = session.query(User).select(User.c.user_name=='jane')[0]
# iterate through the User object's addresses. this will incur an # immediate load of those child items sqlfor a in user.addresses:
print repr(a)
When using mappers that have relationships to other mappers, the need to specify query criterion across multiple tables arises. SQLAlchemy provides several core techniques which offer this functionality.
When specifying columns to the select()
method (including variants like selectfirst()
, selectone()
, etc.) or the generative filter()
method of Query
, if the columns are attached to a table other than the mapped table, that table is automatically added to the "FROM" clause of the query. This is the same behavior that occurs when creating a non-ORM select
object. Using this feature, joins can be created when querying:
sqll = session.query(User).select(and_(users.c.user_id==addresses.c.user_id, addresses.c.street=='123 Green Street'))
Above, we specified selection criterion that included columns from both the users
and the addresses
table. Note that in this case, we had to specify not just the matching condition to the street
column on addresses
, but also the join condition between the users
and addresses
table. Even though the User
mapper has a relationship to the Address
mapper where the join condition is known, the select method does not assume how you want to construct joins. If we did not include the join clause, we would get:
# this is usually not what you want to do sqll = session.query(User).select(addresses.c.street=='123 Green Street')
The above join will return all rows of the users
table, even those that do not correspond to the addresses
table, in a cartesian product with the matching rows from the addresses
table.
Another way to specify joins more explicitly is to use the from_obj
parameter of select()
, or the generative select_from()
method. These allow you to explicitly place elements in the FROM clause of the query, which could include lists of tables and/or Join
constructs:
sqll = session.query(User).select(addresses_table.c.street=='123 Green Street', from_obj=[users_table.join(addresses_table)])
In the above example, the join()
function by default creates a natural join between the two tables, so we were able to avoid having to specify the join condition between users
and addresses
explicitly.
Using a generative approach:
sqll = session.query(User).filter(addresses_table.c.street=='123 Green Street'). select_from(users_table.join(addresses_table)).list()
Note that select_from()
takes either a single scalar element or a list of selectables (i.e. select_from([table1, table2, table3.join(table4), ...])
), which are added to the current list of "from" elements. As is the behavior of constructed SQL, join
elements used in the from_obj
parameter or the select_from()
method will replace instances of the individual tables they represent.
Another way that joins can be created is by using the select_by
method or the generative filter_by
methods of Query
, which have the ability to create joins across relationships automatically. These methods are in many circumstances more convenient than, but not as flexible as, the more SQL-level approach using the select()
/select_from()
methods described in the previous section.
To issue a join using select_by()
, just specify a key in the keyword-based argument list which is not present in the primary mapper's list of properties or columns, but is present in the property list of some relationship down the line of objects. The Query
object will recursively traverse along the mapped relationships starting with the lead class and descending into child classes, until it finds a property matching the given name. For each new mapper it encounters along the path to the located property, it constructs a join across that relationship:
sqll = session.query(User).select_by(street='123 Green Street')
The above example is shorthand for:
l = session.query(User).select(and_( Address.c.user_id==User.c.user_id, Address.c.street=='123 Green Street') )
select_by()
and its related functions can compare not only column-based attributes to column-based values, but also relations to object instances:
# get an instance of Address. assume its primary key identity # is 12. someaddress = session.query(Address).get_by(street='123 Green Street') # look for User instances which have the # "someaddress" instance in their "addresses" collection sqll = session.query(User).select_by(addresses=someaddress)
Where above, the comparison denoted by addresses=someaddress
is constructed by comparing all the primary key columns in the Address
mapper to each corresponding primary key value in the someaddress
entity. In other words, its equivalent to saying select_by(address_id=someaddress.address_id)
.
The join()
method is a generative method which can apply join conditions to a Query
based on the names of relationships, using a similar mechanism as that of select_by()
and similar methods. By specifying the string name of a relation as its only argument, the resulting Query
will automatically join from the starting class' mapper to the target mapper, indicated by searching for a relationship of that name along the relationship path.
sqll = session.query(User).join('addresses').list()
One drawback of this method of locating a relationship is that its not deterministic. If the same relationship name occurs on more than one traversal path, its only possible to locate one of those relationships. Similarly, if relationships are added to mapped classes, queries that worked fine may suddenly experience a similar conflict and produce unexpected results.
So to specify a deterministic path to join()
, send the relationship name or a path of names as a list. Such as:
l = session.query(User).join(['addresses']).list()
Where above, if the "addresses" relation is not present directly on the User
class, an error is raised.
To traverse more deeply into relationships, specify multiple relationship names in the order in which they are constructed:
orders = session.query(Order).join(['customer', 'addresses']).select_by(email_address="foo@bar.com")
For those familiar with older versions of Query
, the join()
method is an easier-to-use version of the join_by()
, join_to()
and join_via()
methods, all of which produce ClauseElements
that can be constructed into a query criterion. Consult the generated documentation for information on these methods.
(new in 0.3.7) To help in navigating collections, the with_parent()
generative method adds criterion which corresponds to instances which belong to a particular parent. This method makes use of the same "lazy loading" criterion used to load relationships normally, which means for a typical non-many-to-many relationship it will not actually create a join, and instead places bind parameters at the point at which the parent table's columns are normally specified. This means you get a lighter weight query which also works with self-referential relationships, which otherwise would require an explicit alias
object in order to create self-joins. For example, to load all the Address
objects which belong to a particular User
:
# load a user someuser = session.query(User).get(2) # load the addresses of that user sqladdresses = session.query(Address).with_parent(someuser).list()
# filter the results sqlsomeaddresses = session.query(Address).with_parent(someuser). filter_by(email_address="foo@bar.com").list()
Eager Loading describes the loading of parent and child objects across a relation using a single query. The purpose of eager loading is strictly one of performance enhancement; eager loading has no impact on the results of a query, except that when traversing child objects within the results, lazy loaders will not need to issue separate queries to load those child objects.
Eager Loading is enabled on a per-relationship basis, either as the default for a particular relationship, or for a single query using query options, described later.
With just a single parameter lazy=False
specified to the relation object, the parent and child SQL queries can be joined together.
mapper(Address, addresses_table) mapper(User, users_table, properties = { 'addresses' : relation(Address, lazy=False) } ) sqlusers = session.query(User).select(User.c.user_name=='Jane')
for u in users: print repr(u) for a in u.addresses: print repr(a)
Above, a pretty ambitious query is generated just by specifying that the User should be loaded with its child Addresses in one query. When the mapper processes the results, it uses an Identity Map to keep track of objects that were already loaded, based on their primary key identity. Through this method, the redundant rows produced by the join are organized into the distinct object instances they represent.
Recall that eager loading has no impact on the results of the query. What if our query included our own join criterion? The eager loading query accomodates this using aliases, and is immune to the effects of additional joins being specified in the original query. To use our select_by example above, joining against the "addresses" table to locate users with a certain street results in this behavior:
sqlusers = session.query(User).select_by(street='123 Green Street')
The join implied by passing the "street" parameter is separate from the join produced by the eager join, which is "aliasized" to prevent conflicts.
back to section topThe options()
method on the Query
object is a generative method that allows modifications to the underlying querying methodology. The most common use of this feature is to change the "eager/lazy" loading behavior of a particular mapper, via the functions eagerload()
, lazyload()
and noload()
:
# user mapper with lazy addresses mapper(User, users_table, properties = { 'addresses' : relation(mapper(Address, addresses_table)) } ) # query object query = session.query(User) # make an eager loading query eagerquery = query.options(eagerload('addresses')) u = eagerquery.select() # make another query that wont load the addresses at all plainquery = query.options(noload('addresses')) # multiple options can be specified myquery = oldquery.options(lazyload('tracker'), noload('streets'), eagerload('members')) # to specify a relation on a relation, separate the property names by a "." myquery = oldquery.options(eagerload('orders.items'))
The above examples focused on the "one-to-many" relationship. To do other forms of relationship is easy, as the relation
function can usually figure out what you want:
metadata = MetaData() # a table to store a user's preferences for a site prefs_table = Table('user_prefs', metadata, Column('pref_id', Integer, primary_key = True), Column('stylename', String(20)), Column('save_password', Boolean, nullable = False), Column('timezone', CHAR(3), nullable = False) ) # user table with a 'preference_id' column users_table = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('password', String(20), nullable = False), Column('preference_id', Integer, ForeignKey("user_prefs.pref_id")) ) # engine and some test data engine = create_engine('sqlite:///', echo=True) metadata.create_all(engine) engine.execute(prefs_table.insert(), dict(pref_id=1, stylename='green', save_password=1, timezone='EST')) engine.execute(users_table.insert(), dict(user_name = 'fred', password='45nfss', preference_id=1)) # classes class User(object): def __init__(self, user_name, password): self.user_name = user_name self.password = password class UserPrefs(object): pass mapper(UserPrefs, prefs_table) mapper(User, users_table, properties = dict( preferences = relation(UserPrefs, lazy=False, cascade="all, delete-orphan"), )) # select session = create_session(bind_to=engine) sqluser = session.query(User).get_by(user_name='fred')
save_password = user.preferences.save_password # modify user.preferences.stylename = 'bluesteel' # flush sqlsession.flush()
The relation
function handles a basic many-to-many relationship when you specify an association table using the secondary
argument:
metadata = MetaData() articles_table = Table('articles', metadata, Column('article_id', Integer, primary_key = True), Column('headline', String(150), key='headline'), Column('body', TEXT, key='body'), ) keywords_table = Table('keywords', metadata, Column('keyword_id', Integer, primary_key = True), Column('keyword_name', String(50)) ) itemkeywords_table = Table('article_keywords', metadata, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")) ) engine = create_engine('sqlite:///') metadata.create_all(engine) # class definitions class Keyword(object): def __init__(self, name): self.keyword_name = name class Article(object): pass mapper(Keyword, keywords_table) # define a mapper that does many-to-many on the 'itemkeywords' association # table mapper(Article, articles_table, properties = dict( keywords = relation(Keyword, secondary=itemkeywords_table, lazy=False) ) ) session = create_session(bind_to=engine) article = Article() article.headline = 'a headline' article.body = 'this is the body' article.keywords.append(Keyword('politics')) article.keywords.append(Keyword('entertainment')) session.save(article) sqlsession.flush()
# select articles based on a keyword. select_by will handle the extra joins. sqlarticles = session.query(Article).select_by(keyword_name='politics')
a = articles[0] # clear out keywords with a new list a.keywords = [] a.keywords.append(Keyword('topstories')) a.keywords.append(Keyword('government')) # flush sqlsession.flush()
Many to Many can also be done with an association object, that adds additional information about how two items are related. In this pattern, the "secondary" option to relation()
is no longer used; instead, the association object becomes a mapped entity itself, mapped to the association table. If the association table has no explicit primary key columns defined, you also have to tell the mapper what columns will compose its "primary key", which are typically the two (or more) columns involved in the association. Also, the relation between the parent and association mapping is typically set up with a cascade of all, delete-orphan
. This is to ensure that when an association object is removed from its parent collection, it is deleted (otherwise, the unit of work tries to null out one of the foreign key columns, which raises an error condition since that column is also part of its "primary key").
from sqlalchemy import * metadata = MetaData() users_table = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), ) articles_table = Table('articles', metadata, Column('article_id', Integer, primary_key = True), Column('headline', String(150), key='headline'), Column('body', TEXT, key='body'), ) keywords_table = Table('keywords', metadata, Column('keyword_id', Integer, primary_key = True), Column('keyword_name', String(50)) ) # add "attached_by" column which will reference the user who attached this keyword itemkeywords_table = Table('article_keywords', metadata, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")), Column('attached_by', Integer, ForeignKey("users.user_id")) ) engine = create_engine('sqlite:///', echo=True) metadata.create_all(engine) # class definitions class User(object): pass class Keyword(object): def __init__(self, name): self.keyword_name = name class Article(object): pass class KeywordAssociation(object): pass # Article mapper, relates to Keyword via KeywordAssociation mapper(Article, articles_table, properties={ 'keywords':relation(KeywordAssociation, lazy=False, cascade="all, delete-orphan") } ) # mapper for KeywordAssociation # specify "primary key" columns manually mapper(KeywordAssociation, itemkeywords_table, primary_key=[itemkeywords_table.c.article_id, itemkeywords_table.c.keyword_id], properties={ 'keyword' : relation(Keyword, lazy=False), 'user' : relation(User, lazy=False) } ) # user mapper mapper(User, users_table) # keyword mapper mapper(Keyword, keywords_table) session = create_session(bind_to=engine) # select by keyword sqlalist = session.query(Article).select_by(keyword_name='jacks_stories')
# user is available for a in alist: for k in a.keywords: if k.keyword.name == 'jacks_stories': print k.user.user_name
Keep in mind that the association object works a little differently from a plain many-to-many relationship. Members have to be added to the list via instances of the association object, which in turn point to the associated object:
user = User() user.user_name = 'some user' article = Article() assoc = KeywordAssociation() assoc.keyword = Keyword('blue') assoc.user = user assoc2 = KeywordAssociation() assoc2.keyword = Keyword('green') assoc2.user = user article.keywords.append(assoc) article.keywords.append(assoc2) session.save(article) session.flush()
SQLAlchemy includes an extension module which can be used in some cases to decrease the explicitness of the association object pattern; this extension is described in associationproxy.
Note that you should not combine the usage of a secondary
relationship with an association object pattern against the same association table. This is because SQLAlchemy's unit of work will regard rows in the table tracked by the secondary
argument as distinct from entities mapped into the table by the association mapper, causing unexpected behaviors when rows are changed by one mapping and not the other.