Preprint Tasora Anitescu Comp

Upload
sadasdasdff 
Category
Documents

view
227 
download
0
Transcript of Preprint Tasora Anitescu Comp

8/2/2019 Preprint Tasora Anitescu Comp
1/17
Amatrixfree cone complementarity approach for solving largescale,nonsmooth, rigid bodydynamics
Preprint ANL/MCSP16921109
A.Tasora a M.Anitescu baUniversita degli Studi di Parma, Dipartimento di Ingegneria Industriale, 43100 Parma, Italy, [email protected]
bMathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439, USA,[email protected]
Abstract
This paper proposes an iterative method that can simulate mechanical systems featuring a large number of contacts and jointsbetween rigid bodies. The numerical method behaves as a contractive mapping that converges to the solution of a cone complementarity problem by means of iterated fixedpoint steps with separable pro jections onto convex manifolds. Since computationalspeed and robustness are important issues when dealing with a large number of frictional contacts, we have performed specialalgorithmic optimizations in order to translate the numerical scheme into a matrixfree algorithm with O(n) space complexity andeasy implementation. A modified version, that can run on parallel computers is discussed. A multithreaded version of the methodhas been used to simulate systems with more than a million contacts with friction.
Key words: Largescale multibody, contact, friction, cone complementarity problem
1. Introduction
Many engineering problems involve unilateral contacts between rigid bodies, for instance, in simulationsof robotic cells and part feeders, in cam followers, inmasonry stability analysis, and in packaging devicessuch as those depicted in Fig. 1.
The dynamical simulation of such systems is complicated by the nonsmooth nature of the frictional constraints. When the number of contacts between bodies increases to thousands or millions, as in the caseof granular flows in silos or in rocksoil dynamics, thecomputational efficiency of traditional methods can
become an issue even on supercomputers.A straightforward approach to solve this class of
problems may consist of regularization schemes, thattransform the discontinuities into a stiff force field.This is, for example, the approach often adopted bydiscrete element schemes (DEMs) because it doesnot require major modifications to traditional solversbased on smooth ordinary differential equations(ODEs) [15,35,36,30]. Nevertheless, although successfully used to simulate granular flows with many contacts, the regularization approach requires small timesteps to achieve numerical stability. Moreover, it forcesthe user to introduce artificial stiffness or heuristicparameters: actually, if the deformation of the parts isnegligible, a method that can use large time steps andunconditionally rigid bodies would be more welcome.
These considerations encouraged our research on a
fast, robust, and unified numerical scheme that canhandle complex mechanical systems made of rigid bodies with an arbitrary number of contacts and joints.Such a scheme aims at simulating mechanical systemsranging from the simplest (articulated linkages withfew bilateral kinematic pairs and motors) up to themost complex (for example, a bulldozer interactingwith millions of particles of sand with the tracks andthe blade).
In this context, the biggest challenge comes from thediscontinuous nature of the adhesion constraints andnoninterpenetration constraints; in fact, the simula
tion of rigid contacts embeds the solution of nonsmoothequations. To this end, the straightforward applicationof numerical methods for ODEs or differential algebraic equations (DAEs) is inefficient. In fact, a naiveapproach based on piecewise integrals is virtually impossible because it would require stopping and restarting the integrator at each discontinuity to change theactive set of constraints. This could work only if therewere a limited number of unilateral constraints [20,21].Otherwise, the risk of combinatorial explosion couldseverely affect the computational efficiency to the pointwhere the simulation would come to a halt [45].
The nature of nonsmooth dynamics requires the
adoption of a deeper mathematical framework, whereconcepts like setvalued functions, inclusions, and complementarity conditions are used [33]. In particular,recent timestepping approaches construct weak solu
Preprint submitted to Elsevier 21 January 2010

8/2/2019 Preprint Tasora Anitescu Comp
2/17
tions of the differential variational inequality (DVI)that describes the continuous time motion of rigidbodies with collision, contact, and friction. Earlier numerical methods based on differential variational inequalities can be found in [26,25,24], whereas the DVIformulation has been discussed in full generality andclassified by differential index only recently, in [37,29].
Two main families of solvers spawn from the DVI
formulation: those that lead to an accelerationforcecomplementarity problem [9,31,45] and that generate velocityimpulse, complementaritybased timestepping methods [39,6,7]. The latter case results inschemes convergent to a vector measure differentialinclusion, so named because it operates on vector measures or distributions [40]. It has the advantage thatit can solve a class of problems with Coulomb friction that would be unsolvable in an accelerationforcecontext, as the Painleve paradox [38].
In both cases, the introduction of inequalities intimestepping schemes for DVI, together with a polyhedral approximation of the friction cone as a faceted
pyramid, leads to linear complementarity problems(LCPs) [40], which are systems of complementaryinequalities to be satisfied simultaneously [14]. SuchLCPs, which are hard to solve because of their inherent nonlinear nature, must be solved at each time stepin order to advance the integrator [24,40].
Most literature about this topic shows how, for alarge number of contacts and rigid bodies, usual LCPsolution schemes have significant limitations. In fact,classical approaches to the solution of LCP problemsare based on simplex methods, also known as direct orpivoting methods, originating from the algorithms of
Lemke and Dantzig [13]. These methods may exhibitan exponential worstcase complexity [10]. Our experience shows that, in spite of algorithmic optimizations[43], simplex methods still cannot practically handlemultibody systems with more than one hundred colliding bodies.
Moreover, in the threedimensional case, typicalLCP solvers can be used only at the cost of approximating the Coulomb friction cone with facetedpyramids [40,45,6]. Not only does this expedient introduces artificial anisotropy in the friction phenomenon;it also impacts negatively the performance of LCPsolvers, which is already critical in general, becausethe finite approximation of cones results in a muchlarger problem.
A precise description of the friction cone constraintin threedimensional space would imply a nonlinearcomplementarity problem. This is a broader class ofproblems in mathematical programming, for which noofftheshelf solvers are available. A custom methodmust be developed.
The abovementioned limitations of the existingLCP approaches led us to develop a novel solutionmethod based on a fixedpoint iteration with projection on a convex set and presented in [8]. That method
extended the seminal work on iterative LCP solversby Mangasarian [28] to the LCP case with conicalconstraints, that is, a cone complementarity problem.In the same work we presented the convergence theory
for the iteration; the scheme converges under certainconditions that do not include a small friction assumption. Applied to granular flow problems, our methoddemonstrated high performance and was able to solvebenchmarks with up to a million dual variables.
The timestepping scheme was proven to convergein a measure differential inclusion sense to the solutionof the original continuoustime DVI [2].
In the present paper we extend our original formulation [8] in several ways.(i) We enhance our approach to the case of both
frictional contacts and bilateral constraints, either scleronomic or rheonomic (motors, imposedtrajectories, etc.). This extension cannot be obtained with full theoretical guarantees of convergence for the algorithm in [8] by simply replacingthe bilateral constraint with two unilateral constraints. The latter case allows for unbounded internal forces, for which the approach in [8, Corollary 1] does not apply (the resulting constraintcone is not pointed).
(ii) We present practical algorithmic details and optimizations that can be adopted to implementthe method in a matrixfree, memoryefficient,reliable, fast, and robust way.
(iii) We demonstrate the performance of our approach for configurations that include both jointand contactwithfriction constraints.
A significant sideeffect of the proposed method isthat it proceeds monotonically toward the solution. Ifimplemented in realtime applications such as virtualreality and manintheloop vehicle simulations, wherethe requirements on precision are less severe than those
on the computational times, it can be stopped prematurely before the tolerance threshold is reached.
We believe that the CAD community will welcomethe availability of this solver because of its abilityof simulating generic mechanisms regardless of thenumber of parts, joints, and frictional contacts. Tothis end we developed a physics library based on thistimestepping method, written in C++ and calledChrono::Engine [41], that can be used by third partiesto develop simulation software. Indeed, we used it toimplement a 3D graphical interface for the interactive modeling and visualization of multibody systems.Moreover, we have already simulated many types ofmechanical problems, ranging from robots to granularflows with hundreds of thousands of rigid bodies withfriction.
Among the most complex tests, we simulated thegranular flow of a fourthgeneration pebble bed nuclearreactor. Thanks to the numerical scheme, the refuelingmotion of 170,000 uranium spheres was simulated ona single computer, whereas DEM methods required asupercomputer and much more CPU time [42].
2. The model
This section presents a formulation for the nonsmooth dynamics of multibody systems in the mostgeneral case of both bilateral constraints and frictional
2

8/2/2019 Preprint Tasora Anitescu Comp
3/17
Fig. 1. Simulation of a palletizing device: a multibody problemwith many frictional contacts.
contacts. We remark that frictional contacts embedalso the case of unilateral constraints, since those rep
resent a special case of contacts without friction.
2.1. System state
The position of the system at time t is representedby mq generalized coordinates q(t) Rmq . In the caseof rigid bodies in threedimensional space, these coordinates include the positions x R3 of the centers ofmass of all bodies, as well as the rotations of all bodyframes respect to the absolute frame.
We represent rotations by means of unimodular
quaternions (t) S3 H. Since quaternions arefourdimensional numbers, each rigid body in ourformulation requires 3+4=7 scalar coordinates plusone constraint  = 1 enforcing the unit length ofthe quaternion 1 . With this notation, we define the
position vector to be q = {x1T ,1T ,x2T,2T,...}.Generalized velocities are represented by the vec
tor v(t) Rmv , where for each body we consider thespeed of the center of mass x R3 and the angular velocity l, expressed in local body coordinates.Therefore a system with n bodies in three dimensionsis represented by mv = 6n speed coordinates. With
this notation, we define the speed vector to be v ={x1T ,1Tl , x2T
,2T
l ,...}T.Given the angular velocity l, one can obtain the
time derivative of the quaternion, if needed, by building the purely imaginary quaternion {0,l} and computing the quaternion product = 12{0,l}. On thisbasis, we introduce q = (q,v) as the linear map thatgives the time derivative of the position:
1 Different methods can be used to represent the rotation. Ifone stores the 3x3 rotation matrix A SO(3,R), each body will
require 3+9=12 scalar coordinates, which are highly redundant.On the other hand, storing only three angles (such as the threeEuler angles or the three Cardano angles) could give problemsof singularities. Not being affected by these limitations, quaternions are better suited for computational application.
(q,v) = q = {x1T, 1T , x2T , 2T,...} == {x1T, 1
21{0,1l }T, x2
T
,1
22{0,2l }T,...}T.
(1)
For the time integration of the system position,different options exist for the function q(t+t) =(q(t),v, t), the simplest one being the firstorder
explicit Euler formula q(t+t) = q(t) + tq(t). Indeed,
for body positions a straightforward firstorder differential approximation is used: x(t+t) = x(t) + tx(t).However, if the same approach is used for quaternions,as in (t+t) = (t) + t(t), an annoying situationcan happen: quaternions may slowly lose the unimodularity and drift away from the S3 manifold, unlesssome stabilization is used to enforce  = 1. Therefore, we prefer to integrate the rotations using thefollowing exponential map that preserves the unimodularity of the quaternions:
(t+t) = (t)e{0,1
2lt}, (2)
so we get
(t+t) = (q(t),v, t) =
x1,(t) + tx1,(t)
1,(t)e{0,1
21l t}
x2,(t) + tx2,(t)
2,(t)e{0,1
22l t}
...
.
(3)We can explicitly compute the second factor of the
quaternion product thanks to the property e{0,u} =
{cos ,u sin
}of quaternion exponentials; we obtain
e{0,1
2lt} = e{0,(l/l)
1
2lt} =
= {cos 12lt, ll sin
1
2lt}. (4)
Since (2) preserves the norm of the quaternions,large t time steps can be used (although, once in awhile, it is safer to normalize all quaternions becausenumerical roundoff can accumulate small errors). Yetwe can demonstrate that, for t 0, the formula (2)still corresponds to (t+t) = (t) + t. In fact,
= limt0
(t+t) (t)
t
. (5)
Hence, substituting (2) in (5), we can write
= limt0
(t){cos 1
2lt, ll sin 12 lt} {1, 0}
t.
(6)Applying the Hopital theorem and simplifying, we
obtain = 12{0,l}, as expected.
2.2. Bilateral constraints
Most kinematic pairs, such as revolute joints, prismatic joints, and glyphs, can be expressed by means
of holonomic constraints over the relative position oftwo bodies. In general, we introduce a set GB of scalarequations
i(q, t) = 0, i GB. (7)
3

8/2/2019 Preprint Tasora Anitescu Comp
4/17
The size of the set GB is the number of basic scalarbilateral constraints, not necessarily corresponding tocomplex threedimensional mechanical joints 2 .
We assume that i(q, t) is smooth, so that it canbe differentiated to obtain the Jacobian qi =
i/qT
.Constraints must be respected also at the velocity
level: the full timederivative of the ith constraint equa
tion is
di(q, t)
dt=
i
qq +
i
t= qiT q +
i
t= 0
di(q, t)
dt= qiT(q,v) +
i
t= 0.
For simplicity, from now on we will define iT =qiT(q,v).
Note that the term i
t is nonzero only for rheonomic(timedependent) constraints such as motors and imposed trajectories.
For each bilateral constraint, there exists a Lagrange
multiplier iB such that the force acting on the systemby the ith bilateral constraint is iBi [20].
2.3. Unilateral contact constraints
Since rigid bodies cannot overlap each other, giventhe set of body shapes = {1, 2, ..., n}, weassumethat there exists a set ofGP distance functions (q, )that must satisfy the unilateral constraint conditions:
i(q, ) 0, i GP . (8)An example of such a mapping is the signed distance
function [23].For convenience, we pose the problem in terms of
contact points, since in most cases we can compute aminimal set of contact normals belonging to one of twoneighboring bodies: distances (q, ) are measuredalong those normals.
For example, in the case of spherical bodies withpositions xjs and radius rs, for all body pairs j, k thesigned distance function is = xjsxks  2rs. Note,however, that, for bodies with generic shapes, findinga proper set of contact points (and defining their i
distance functions) is not always trivial [3,4]. In fact,
there could be multiple contact points, or it could evenhappen that defining a differentiable signed distancefunction is not possible, as in the case of concave shapes[3].
Nevertheless, since we are interested in enforcingnonpenetration, what truly matters is that a signeddistance function be defined up to some value of thepenetration [4]. We thus assume that (q, ) can bedifferentiably defined at least on a neighborhood of theset (q, ) 0. Such an assumption does hold forsmooth and strictly convex bodies, such as spheres [3].
2 In the context of this work, bilateral constraints are always
considered scalar, because complex mechanical joints can bemodeled by using multiple basic scalar constraints. For example, kinematic pairs such as a ball joint require three scalarequations, a prismatic guide requires five scalar equations, andso on.
In addition, piecewise smooth bodies can be accommodated in a fixedtimestep framework, by decomposingthe distance function in components attached to eachpair of features, such as point and piecewise smoothsurface or curved edge and curved edge, and using allnormals attached to such pairs [19].
Also, we consider only a subset GA(q, , ) GPof all potential contacts, that is, only those contacts
whose surfaces are under a distance threshold :GA(q, , ) =
i i GP , i(q, ) . (9)
Special attention must be paid in implementing anefficient and robust collision algorithm for the generation of the GA(q, , ) set of contact points.
A preliminary algorithm, called broadphase collision detection, discards pairs of shapes that are fartherthan , in order to avoid a combinatorial waste of timeif checking for collision points with all pairs of bodies.We use the sweep and prune (SAP) algorithm to thisend [17].
The following narrowphase step operates on thepairs of bodies that passed the broadphase check: itfinds the contact points and their normals. We adoptthe GJK algorithm for this purpose because it featureshigh efficiency and robustness even in the case of nonsmooth surfaces [16].
Concave shapes, if any, undergo an offline conversion into sets of convex shapes using a convex decomposition algorithm; a middlephase AABB binarytree traversal is used to check collisions between thesecompounds of shapes without running into superlineartime complexity.
A thin envelope is added around all shapes using
the Minkowski sums in order to allow a small amountof interpenetration. If larger overlapping occurs, theExpanding Polytope (EPA) algorithm is used [11].
2.4. Frictional constraints
In the following section we introduce friction bymeans of conic constraints, which are an extension ofcomplementarity models discussed in [6,40].
2.4.1. The Coulomb friction model
The original Coulomb model introduces static sand kinetic k friction coefficients as the only parameters to characterize the frictional phenomena at thesurface. Although simple, this model was proven to berealistic and practical in many situations. Usually thekinetic coefficient is slightly lower than the static coefficient, but in this work we consider both to have thesame value .
If a position q is feasible and the contact is active,that is, (q, ) = 0, then at the contact we have anormal force and a tangential force.
Let n be the normal at the contact pointing fromthe second body to the first body, and let t1 and t2 be
the tangents at the contact. Here n, t1, t2 are mutuallyorthogonal vectors of length one in three dimensions.The vectors n, t1, and t2 are a function of the positionq, but we ignore this fact until the end of this section.
4

8/2/2019 Preprint Tasora Anitescu Comp
5/17
The reaction force is impressed on the system bymeans of multipliers n 0, u, and v. The normalcomponent of the force is FN = nn, and the tangential component of the force is FT = ut1 +vt2.
The Coulomb model consists of the following constraints:
n 0, (q) 0, (q)
n = 0,
n 2u +2v , vTn 2u +2v = 0,FT, vT = FTvT ,
(10)where vT is the relative tangential velocity at contact.The effect of the friction over the dynamical system isdefined by the friction coefficient R+, which typically has a value between 0 and 1 for most materials.
The first part of the constraint can be restated as
F = FN + FT = nn +ut1 +vt2 K,where K is a cone in three dimensions, whose slope isarctan().
The constraint FT, vT = FTvT requiresthat the tangential force be opposite to the tangentialvelocity. As a result, the reaction force is dissipative.In fact, an equivalent convenient way of expressing thisconstraint is by using the maximum dissipation principle [40,38,39],
(u,v) = argmin2u+2vn (ut1 +vt2)T vT.These constraints are represented by mapping the
vectors n, t1, t2 from contact coordinates to generalized coordinates [3].
We denote the generalized vector version ofn, t1, t2
by Dn,Du,Dv.In generalized coordinates, the Coulomb model be
comes [8]
FN =nDn, FT =uDu +vDv , (11)n 0, (q) 0, n(q) = 0, (12)where the tangential multipliersu, v are determinedfrom the maximum dissipation principle
(u,v) = argmin2u+2vn (uDu +vDv)T v.The last relation is obtained from the identitiesDTuv =tT
1vT
and DTvv = tT
2vT
.
2.5. The Overall dynamical model
The other dynamical data needed for the model arethe mass matrix M(q), which is symmetric positivedefinite; the external force fe(t, q, v); and the inertialforce fc(q, v), containing the centrifugal and Coriolisforces.
We can define the total force
ft(t, q, v) = fe(t, q,v) + fc(q, v). (13)
Assume now that we have multiple contact con
straints i(q, ) 0, i GA and multiple bilateralconstraints i(q, t) = 0, i GB. Note that the unilateral condition n 0, 0, n = 0 can be writtenas a complementarity constraintn 0 0.
The continuous model is a differential variationalinequality [37]:
M(ql)dv
dt=
iGA
inDin +iuDiu +ivDiv++iGB
iBi + ft(t, q,v)q = (q, v)
i(q, t) = 0 i GBin 0 i(q, ) 0, i GAiu,iv = argminiin(iu)2+(iv)2 i GA
vTuDiu +vDiv .
(14)Unfortunately, the introduction of the Coulomb
friction model may lead to an inconsistent model. Itis known [9] that paradoxical configurations exist forwhich such a model does not have a solution in termsof unknown accelerations and reaction forces. Such
configurations are called Painleve paradoxes [39]. Nevertheless, a weaker formulation of the problem can besolved in terms of vector measures, using a nonsmoothtimestepping scheme where reaction impulses are theunknowns at each time step [39].
To this end we define the following stepping scheme,with time step h, known positions q(l), and velocityv(l); the scheme is an equation problem with equilibrium constraints, where the unknowns are q(l+1),v(l+1), and constraint impulses n = hn, u = hu,v = h
v, B = h
B:
M(l)(v(l+1)
vl) = iGA inDin + iuDiu + ivDiv ++iGB
iBi
+ hft(t
(l),q(l), v(l))(15)
0 =1
hi(q(l)) + iTv(l+1) +
i
t, i GB(16)
0 1h
i(q(l)) + iTv(l+1) (17) in 0, i GA
iu, iv
= argmin
iin
(iu)2+(iv)
2 i GA
vT(uD
iu + vD
iv)
(18)
q(l+1) = (q(l), v(l+1), h). (19)
To simplify notation, we denoted M(ql) by M(l).In previous work, we have shown that the scheme is
convergent, as the time step h goes to 0, to the solutionof a measure differential inclusion [2].
For the special case of zero friction, the subproblemsimplifies to (1517), that is, a linear complementarityproblem. Such problems can be solved by Lemkes algorithm [14,6]. Introducing the Coulomb friction (18),however, turns the problem into a nonlinear complementarity problem that poses more difficulties. If thenonlinear constraint cone (the Coulomb cone) is ap
proximated by a piecewise linear cone, the subproblem(1518) becomes again an LCP solvable by Lemkes algorithm [6]. Nevertheless, in [5] we have also demonstrated that, as the number of constraints in the prob
5

8/2/2019 Preprint Tasora Anitescu Comp
6/17
lem increases, the computational cost of typical LCPsolvers increases far faster than linearly with the size ofthe problem. Moreover, the approximation of frictioncones by means of faceted pyramids would introduceunwanted anisotropy.
To overcome these difficulties, we modified the timestepping scheme by relaxing the constraint (17) as
0
1
hi(q(l)) +
i
T
v(l+1)
i
(Di,Tu v)2 + (Di,Tv v)
2 in 0, i GA.(20)
This results in a cone complementarity problem thatcan be solved with a fixedpoint iteration approach, asdemonstrated in our earlier work [8].
2.6. Cone complementarity formulation
Developing the optimality conditions for the equilibrium constraint in (18), we obtain that there existsa Lagrange multiplier i such that, for any i
GA,
iiu = Di,Tu v, iiv = Di,Tv v,i 0 iin
(iu)
2+ (iv)
2 0.(21)
The first two equations imply that i
(iu)2+(iv)
2=(Di,Tu v)
2
+(Di,Tv v)2, while the complementarity con
straint implies that
0 = i
(iu)2 + (iv)
2
iin
(iu)
2 + (iv)2
and, in turn, that
iin(Di,Tu v)
2
+(Di,Tv v)2 = i
iu
2+
iv
2
.(22)We now define, for all potential contacts, the vectors
uiA =
1
hi(q(l)) + iTv(l+1),Di,Tu v,Di,Tv v
T(23)
iA =
in, iu,
iv
T, i GA. (24)
We calculate the scalar product using (20),(21):uiA,
iA
= in
1
hi + iTv
+ iuD
i,Tu v
+ ivDi,Tv v
= i
in(Di,Tu v)2+(Di,Tv v)2
i
iu2
+
iv2
= 0 uiA iA. (25)We recall that the dual cone of a convex cone K is
the set K = {xy Ky,x 0} and that the polarcone is defined as K = K.
We now define the friction cone FCi such that theCoulomb friction model is satisfied ifiA FC i:
FCi =
x,y,z R3ix
y2 + z2
.
Then, from (18), (20), and (25), the frictional contactconstraints can be expressed by means of the followingcone complementarity constraints :
uiA FC i iA FC i, i GA. (26)
To obtain a unified formalism, we can represent alsothe bilateral constraints (16) in terms of cone complementarity constraints. Of course, multipliers iB, withi GB, are not restrained into some special subset ofR, but even R itself is a convex cone. Thus we can introduce the scalar
uiB =1
hi(q(l)) + iTv(l+1) +
t, i GB, (27)
which allows us to write the bilateral constraints (16)as
uiB BCi iB BCi, i GB, (28)where BCi = {R} and BCi = {0}, and uib, ib = 0 isalways satisfied for uiB BCi.
We now define the vector
k(l)
= M(l)v(l) + hft(t(l), q(l),v(l)). (29)
Then, equations (29), (28), and (26), together with(15) and the definition ofuiA,
iA, u
iB, and
iB, result
in the following problem:
M(l)v(l+1) = iGA
inDin + iuDiu + ivDiv++
iGB
iBi
+ k
(l),
uiA FC i iA FC i, i GA uiB BCi iB BCi, i GB.
(30)If we want to obtain a cone complementarity
problem as expressed in the typical form K f(a) a K, a more compact formulation of theproblem (30) is necessary. To this end we denote by
nA and nB the number of elements in the sets GA andGB, respectively. Then, we define the following vectorsbA R3nA , A R3nA , bB RnB , and B RnB :
bA =
1
hi1 , 0, 0,
1
hi2 , 0, 0, . . . ,
1
hinA , 0, 0
TA =
i1n ,
i1u ,
i1v ,
i2n ,
i2u ,
i2v , . . . ,
inAn ,
inAu ,
inAv
TbB =
1
h1 +
1
t,
1
h2 +
2
t, . . . ,
1
hnB +
nB
t
TB =
1B,
2B, . . . ,
nBB
TuA = u
1T
A ,u2T
A , . . . ,unAT
A T
,uB = u1B, u
2B, . . . , u
nBB
T.
(31)It is useful to merge these vectors, joining data from
both frictional constraints and bilateral constraints,obtaining vectors with nE = 3nA+ nB scalar elements:
bE =bTA, b
TB
T, E =
TA,
TB
T, uE =
uTA,u
TB
T.
(32)For each frictional contact i GA we also define the
following threecolumn matrix:
Di =DinDiuDiv
. (33)
As before, it is useful to merge all Jacobians from
both frictional constraints and bilateral constraints ina single, large matrix having nE columns and mv rows:
DE =
Di1 Di2  . . . DinA 12 . . . nB .(34)
6

8/2/2019 Preprint Tasora Anitescu Comp
7/17
From the definitions (23),(27), (31), (32), (33), and(34) one can see that
uE = DTE v
(l+1) + bE . (35)
Also, premultiplying by M(l)1
equation (15), one gets
v(l+1) = M(l)1
DEE + M(l)1k. (36)
Hence it is possible to substitute (36) into (35) to ob
tain
uE = DTE M
(l)1DEE + DTE M
(l)1k + bE . (37)
To make the expressions more compact, we introduce the following:
N = DTE M(l)1DE (38)
r = DTE M(l)1k + bE (39)
In this way, we can write
uE = NE + r. (40)
Consider the multidimensional cone obtained byperforming the direct sum of all FC and BC conesand its embedding in the corresponding vector spacedirect sums:
=
iGA
FCi
iGB
BCi
. (41)
From (41), (40), (31), the complementarity relationship in (30), and the property of convex cones [22], =
i i = i i,, we can write the problem as
a cone complementarity problem (CCP):
(NE + r)
E . (42)The separable structure of the cone will allow us
to define an algorithm based on block matrices, withrelatively small blocks (dimension no larger than 3 inthe case of the contact problem).
2.7. Physical effects of the relaxation
In [2] we demonstrated that, for h 0, the solution of the timestepping scheme with the relaxed constraints (20) will approach the solution of the samemeasure differential inclusion as the scheme that uses
the unrelaxed constraints (17). In addition, iteratesproduced by the modified scheme approach the ones ofthe original scheme even for one time step at fixed h,
provided that iin
(Di,Tu v)
2 + (Di,Tv v)2

8/2/2019 Preprint Tasora Anitescu Comp
8/17
both cases the method is similar to a projected blockGaussSeidel. Another option is to use a null K matrix, so that the method is similar to a projected blockGaussJacobi.
Also, : RnE RnE is the orthogonal projection
operator, whose calculation, as we show in the subsequent sections, is considerably simplified by the separable cone structure (41).
For convergence of the scheme we need the followingassumptions about the algorithm.
A1 The matrix N of the CCP problem is symmetric andpositive semidefinite.
A2 There exists a positive number, > 0 such that, atany iteration r, r = 0, 1, 2, . . ., we have that Br I.
A3 There exists a positive number > 0 such that,at any iteration r, r = 0, 1, 2, . . ., we have that
(r+1 r)T
(Br)1
+ Kr N2
(r+1 r)
r+1 r
2
.
Note that the first assumption is always assured inmultibody systems because the mass matrix is definitepositive, Br and are under our control for satisfyingthe second assumption, and it is always possible toadjust the free and parameters in order to satisfythe third assumption. Our main convergence result [8,Corollary 1] is given in Theorem 1.
Theorem 1 Assume that
0 = E NE = 0(that is, there does not exist a choice of reaction forceswhose net effect is zero; bodies do not get stuck).
Then the algorithm (44) for the cone complementarityproblem applied to (42) produces a bounded sequence,and any accumulation point results in the same velocitysolution.
We note that, even if in [8] we have demonstrated theuse of our algorithm for contacts only, Theorem 1 applies to the bilateral case as well. Nevertheless, the keyobservation is that we consider the bilateral constraintsas linearized constraints for which we develop the additional algorithmic machinery in this paper. The directapplication of the formulation in [8], by formulating abilateral constraint as two unilateral constraints will
violate the main assumption in Theorem 1. Therefore,we had to show that direct treatment of bilateral contraints will result in the same formal abstraction (42).
3.1. Efficient computation of the projection onto thefriction cone
Because of the separable cone structure of the convexsubspace , the metric projection R
nE RnE is
=
FC1(
1A)
T, FC2(2A)
T, . . . , FCnA (nAA )
T,
BC1
(2B)
T
, BC2
(2B)
T
, . . . BCnB
(nBB )
TT ,where FC i() : R3 R3 with i GA, and BCi() :R R with i GB . The projection operator mustbehave as () = argmin  in order to be
FC
i
A
FC(A)
nnnn
v
u
n
r
tttt1
tttt2
ttttg
tttte
FC
ioooo
A
f
f
r
r
n
n
r

8/2/2019 Preprint Tasora Anitescu Comp
9/17
i GB BCi = iBi GA
r < in FC i = iA
r < 1i
n FC i = {0, 0, 0}T
r > in, r > 1i
n FC in =ri + n
2i + 1
FC iu = u iFC in
r
FC iv = viFC in
r.
(46)
4. Implementation
The CCP method proposed here can be applied tothe simulation of multibody systems with a large number of parts and contacts because, where an upper limiton the number of iteration is enforced, the iteration(44) can run in O(n) space and O(n) time.
Previous sections showed that generic multibodyproblems with frictional contacts, expressed with thesystem (15)(19), embed the cone complementarityproblem (42), which can be solved by the iterativemethod (44).
Given (31), one can consider the final timesteppingscheme as a sequence of three main operations: a CCPproblem that finds unknown reactions E (47a), anaffine scaling (47b) that gives the new speeds v(l+1),and a position update (47c):
(NE + r) o
E (47a)v(l+1) = M1
k + DEE
(47b)
q(l+1) = (q(l),v(l+1), h). (47c)
The biggest computational overhead is caused bythe first problem, that is, the CCP (47a). In fact, (47c)is immediate, and (47b) can be computed quickly because in most cases the matrix M is diagonal and itsinverse M1 can be precomputed easily.
The convergence theory about the iterative scheme(44) leaves some degrees of freedom in choosing iA,
iB
values that build the diagonal blocks of the iterationmatrix B. A trivial choice could be to use the sameiA =
iB = value for all diagonal blocks, that is,
B = I, and then use the overrelaxation parameter to control the convergence. However, this may slowconvergence in systems with large mass ratios, evenwith an optimal . A more practical approach, whichcopes better with systems affected by uneven masses, isinspired by the GaussJacobi idea of using the inverseof the diagonal of the system matrix N, so we use iB =
1i,TM1i
and iA =1gi
, where gi is the average ofthe diagonal values of the ith block of the N matrix.
We note that gi can be computed easily from the traceof the 3 3 matrix Di,TM1Di, as
gi =Trace(Di,TM1Di)
3. (48)
We recall that the matrix N is a product of largematrices; N = DTE M
1DE , and it is full even if Dand M are sparse. For systems with a large number ofcontacts, the size ofN would be prohibitive and clearlywould not satisfy the goal ofO(n) space complexity. Tothis end, direct multiplication of vectors and matricesin (44) must be avoided; otherwise the effort and thespace requirement would be superlinear in the number
of constraint.For the reasons above, a scheme that does not needthe explicit building of N, B, and K has been developed, exploiting the sparsity of M and D.
The K matrix in (44) can be chosen freely, withinthe convergence limits posed by assumptions [A1][A3]. Among the most noticeable options, we have thecase K = 0, which results in a scheme like a projectedGaussJacobi, or the case where K is built by usingthe lower blocks of N.
4.1. Optimized projected GaussJacobi CCP
Considering the case K = 0, and recalling Eq. (38),we can express the rth step of the iteration (44) as aninner loop with index i = 1 . . . nA on all nA frictioncones FCi:
i,r+1A =
i,rA iA
Di,TM1
nAz=1
Dzz,rA +
+
nBz=1
zz,rB + k
+ biA
(49)
i,r+1A = FC i
i,r+1A + (1 )
i,rA , (50)
followed by an inner loop with index i = 1 . . . nB onall nB bilateral constraints (which we do not reportbecause it is like (49)(50) except for B instead ofAsubscripts).
However, for each iteration, the previous loop i =1 . . . nA would require quadratic time in terms of potential contacts nA because of the presence of the summations
nAz=1. This major source of slow performance
can be eliminated if one computes the algorithm in incremental form. In fact, from (36) it follows that
vr
= M1
nA
z=1
Dz
z,r
A +
nB
z=1
z
z,r
B +k . (51)
Substituting (51) into (49), we can write
i,r+1A =
i,rA iA
Di,Tvr + biA.
(52)
Considering the optimizations above, we can expressthe final CCP algorithm with the pseudocode of Algorithm 1.
In the proposed algorithm, for achieving high performance, some auxiliary data can be precomputed before starting the iteration. Specifically, we introducethe mv
3 matrix EiA = M
1Di and the vector EiB =
M1i.The iterations, usually stopped when an approxi
mation threshold has been reached, can be also prematurely aborted when r exceeds a limit rmax on the
9

8/2/2019 Preprint Tasora Anitescu Comp
10/17
Algorithm 1: Solve complementarity  PGJ CCP(1) // Precompute some data for friction constraints(2) for i := 1 to nA(3) Ei
A= M1Di
(4) iA
= 3Trace(Di,TEi
A)
(5) // Precompute some data for bilateral constraints(6) for i := 1 to nB(7) Ei
B= M1i
(8) iB
= 1i,TEi
B
(9)(10) // Initialize impulses(11) if warm start with initial guess
E
(12) 0E
= E
(13) else(14) 0
E= 0
(15)(16) // Initialize speeds(17) v0 =
nAi=1
EiAi,0A
+nB
i=1EiBi,0B
+ M1k(18)(19) // Main iteration loop(20) for r := 0 to rmax(21) // Loop on frictional constraints(22) for i := 1 to nA(23) i,r+1
A=
i,r
A i
A Di,Tvr + biA;(24) i,r+1A = i,r+1A + (1 )i,rA ;(25) // Loop on bilateral constraints(26) for i := 1 to nB(27) i,r+1
B=i,r
B i
B
i,Tvr + biB
;
(28) i,r+1B
= i,r+1B
+ (1 )i,r
B;
(29) // Update speeds(30) vr+1 =
nAi=1
EiAi,r+1A
+nB
i=1EiBi,r+1B
+
M1k
(31)(32) return E , v
maximum number of iterations if the simulation must
meet hardrealtime requirements.With minimal modifications to the () operator,the proposed method can be easily adapted to the caseof friction in 2D or the case of generic unilateral constraints.
In our simulations, we chose = 1 and = 1, exceptfor the K = 0 case, where we used = 0.2. We cannotguarantee a priori that this will satisfy condition [A3],but it did for all our simulations. In addition, the matrix sequences Kr and Br were constant. We can therefore claim that Theorem 1 does apply and, since thesequence did not diverge, any accumulation point is asolution of the cone complementarity problem (47a).
In addition, our proofs of the theoretical results allowfor similar conclusions if varies from iteration to iteration. Therefore, we could ensure that at some iteration the appropriate is chosen after decreasing itsvalue a few times until assumption [A3] holds. It canbe shown that if the value of is halved each time [A3]does not hold and the respective iteration is rejected,then [A3] will eventually be satisfied after a finite number of steps. In our experiments, however, the valueswe have chosen for and have worked for all iterations without need of further adjustment.
4.2. Optimized projected GaussSeidel CCP
Another option is to take Kas the lower block structure of N, which results in a scheme similar to a pro
Algorithm 2: Solve complementarity  PGS CCP(1) // Precompute some data for friction constraints(2) for i := 1 to nA(3) Ei
A= M1Di
(4) iA
= 3Trace(Di,TEi
A)
(5) // Precompute some data for bilateral constraints(6) for i := 1 to nB(7) Ei
B= M1i
(8) iB
= 1i,TEi
B
(9)(10) // Initialize impulses(11) if warm start with initial guess
E
(12) 0E
= E
(13) else(14) 0
E= 0
(15)(16) // Initialize speeds(17) v =
nAi=1
EiAi,0A
+nB
i=1EiBi,0B
+ M1k(18)(19) // Main iteration loop(20) for r := 0 to rmax(21) // Loop on frictional constraints(22) for i := 1 to nA(23) i,r+1
A=
i,r
A i
A Di,Tvr + biA;(24) i,r+1A = i,r+1A + (1 )i,rA ;(25) i,r+1
A= i,r+1
A
i,r
A;
(26) v := v+ EiA
i,r+1A
.(27) // Loop on bilateral constraints(28) for i := 1 to nB(29) i,r+1
B=i,r
B i
B
i,Tvr + biB
;
(30) i,r+1B
= i,r+1B
+ (1 )
i,r
B;
(31) i,r+1B
= i,r+1B
i,r
B;
(32) v := v+ EiB
i,r+1B
.(33)(34) return E , v
jected GaussSeidel. The Kmatrix does not need to beexplicitly built: its effect is that, as soon as computed,a reaction impulse i will be used also for computingthe following i+1 impulse, and so on for all i, withoutneeding to finish a single iteration. In practical termsthis means that the
nAz=1 D
zz,rA term in Eqs. (49) and
(51) is split ini1
z=1 Dz
z,r+1A +
nAz=i D
zz,rA , and the
same for bilateral constraints; so the difference fromAlgorithm 1 is that after the update of a single multiplier we immediately update v(l+1), as shown in Algorithm 2. Note that, to update the speeds, we avoid thefull summation (36) and add only the contributions
caused by the change of the single multiplier after theprojection (50):
i,r+1A = i,r+1A i,rA ; (53)
v(l+1),i+1 = v(l+1),i + M1Dii,r+1A . (54)
Hence, the computational overhead is not differentfrom Algorithm 1.
Numerical tests show that this scheme convergesfaster than the case of K = 0; moreover, K = 0 ismore prone to slow convergence in case of redundantconstraints. 3
3 Redundancy in constraints is frequent in simulations of degenerate contact situations, such as flat surface against flat surface, where the collision engine may create a large amount ofsuperfluous contact points.
10

8/2/2019 Preprint Tasora Anitescu Comp
11/17
i,T=EEEE i,T=
b i=B i= i=
B B
B
i,T{A i,T{B
Pointer to rigid bodyA
Pointer to rigid body B
ith bilateral constraint
Di,T=
Ei,T=A
bbbb i=A i= i=
A A
i= Pointer to rigid body APointer to rigid body B
ith frictional contact
Mj= ffffj= vvvvj=xxxxj
j....
l
{{
jth rigid body
Fig. 3. Sparse data structures used by the solver.
5. Optimizations and improvements to the
method
The proposed method can be further developedby introducing some algorithmic and theoretical improvements, obtaining different flavours of the originalscheme. This section discusses the most significantoptimizations.
5.1. Transient data structures
Figure 3 shows how the multibody model is represented by structures that are placed in memory. Thistransient data can be allocated on the heap duringruntime, creating lists with unlimited numbers of constraints and rigid bodies.
Basically, each object that builds up the lists of bilateral constraints encapsulates the pointers to the twoconnected bodies, the Lagrange multiplier iB , the constraint residual biB, the scalar value
iB, and the Jaco
bian i
.The Jacobian should be a long row with mv elements, but this would mean that the storage requirement for each constraint depends on the number ofbodies, hence leading to an algorithm with superlinear space complexity. Instead, we store only the twosmall portions of i that are not null, that is, thetwo parts iA and iB corresponding to the twoconnected bodies, hence requiring only a fixed number of 12 elements per Jacobian. When the i,Tvrmultiplication must be performed, it is computed asi,TA vrA + i,TB vrB. Similar considerations apply tothe data structure for frictional contacts, except that
also i coefficients must be stored and that Jacobiansare made of two larger blocks, each with three rows.Thanks to this major optimization, the memory requirement per constraint and per contact is constant,
so we obtain an optimal O(nE ) space complexity. 4
Looking at Algorithms 1 and 2, one can see that thesolver can operate directly on these structures. Hence,no temporary matrices are necessary. Our method isthus a matrixfree method.
5.2. Stabilization factor
The stabilization terms 1hi(q(l)) and 1hi(q(l)) in(16) and (17) are used to avoid constraint drifting during the time integration [4]. In fact, if these terms weremissing, equations (16) and (17) would simply enforcethe closure of constraints at the speed level, but errorsmight slowly accumulate in constraint positions afterseveral integration steps. 5 .
If the model were linear (that is, the matrix DE wereconstant in space, and, implicitly, in time), these termswould close constraint gaps, if any, in a single step.We experienced that, for bilateral constraints, this approach works well even if the time integration step h is
small. However, this is not always the case when dealing with unilateral constraints, for the reason that follows. Because of numerical issues and nonlinearities,it is not possible to avoid some amount of penetrationin contact constraints. For a contact with penetration,the term 1h
i(q(l)) has a negative value. Hence, the in
equality (17) requires that iv(l+1) (the speed of detachment of the contact) be large enough to bring thetwo surfaces at zero distance in a single step h. Afterthe time integration step, at the next step the two surfaces might keep the 1h
i(q(l)) separation speed, hencecausing a bouncy behavior. Apart from being not predictable, this side effect gets worse if small timestepsh are used, because even the smallest interpenetrationmight cause macroscopic effects that can be perceivedas bouncy motions even in contacts that should haveno restitution.
We investigated different strategies to improve theoriginal stabilization method. One idea was to scalethe terms by Ki factors, with 0 < Ki < 1:
Ki
hi(q(l)) ,
Ki
hi(q(l)).
This diminishes, but does not exclude, the risk of exceeding the interpenetration correction in a single step.
Moreover, it has the drawback of creating artificial softness in stacked objects, since contact stabilization issomewhat delayed in successive frames.
A second approach, which we successfully used inmany tests, involves using a clamping A as the maximum orthogonal speed for penetration recovery. Thisdoes not eliminate the risk of popping out from an intersecting contact, but at least the residual speed ofseparation, if any, is often negligible (comparable with
4 We experienced that in many cases of granular flow simulations, the number of contacts tends to scale linearly with thenumber of rigid bodies n. Hence, under those circumstances
the algorithm shows approximate linear space complexity O(n)also in the number of bodies.5 This would happen either because of the numerical integration, whose finite precision cannot catch all geometric nonlinearities, or because of numerical truncation and errors.
11

8/2/2019 Preprint Tasora Anitescu Comp
12/17
the parameter A, which can be adjusted by the user)and independent of the time step h. Thus, the stabilization terms would become
1
hi(q(l)) , max
1
hi(q(l)), A
.
Note that in this case, if the contact surfaces are separated, we have i > 0 and the A clamping has no
effect; thus it behaves as the original scheme with the1h
i(q(l)) term.Until now we discussed how the stabilization term
can correct the contact penetration errors when thesurface distance is negative. However, it also has a useful side effect also when surfaces are not yet in contact. In fact, Eq. (17) can be interpreted as follows: thecontact constraint is enforced only if the two surfacesare approaching fast enough to close the positive gapi(q(l)) in a h time step. This allows us to include inthe multibody system also contact constraints that arenot yet in contact but simply within a warning envelope; later, the cone complementarity solver will do
the rest.The third approach is more expensive in terms of
CPU effort, because it avoids constraint drifting bysolving an additional complementarity problem overbody positions, at each time step [12]. In this case, onesolves the speed CCP problem using no stabilizationterms on bilateral constraints, except for contacts thatare not yet in contact, where it is useful to have
max
1
hi(q(l)), 0
,
so that contacts with clearance are still allowed to ap
proach until contact, because of positive1
hi
, whilesurfaces already in contact are forced to have a separation speed greater than or equal to 0, regardless of theamount of penetration. Later, after the time step advancement, one performs the following poststabilization step, that is a a linear complementarity problemthat corrects the positions q:
Mq =iGA
inD
in
+iGB
ibi
(55)
0 = i(q(l)) + iTq, i GB (56)0
i(q(l)) +
i
T
q
in
0, i
GA(57)
This problem can be solved by using the same solverused for the speed CCP problem. The poststabilizationidea performs better in the case of very large penetrations and illposed initial conditions.
In Fig. 4 a benchmark shows that the effort in reducing the maximum penetration error q with poststabilization iterations is almost the same as that ofthe basic approach using only the stabilization coefficient in the speed CCP. However some iterations onthe speed CCP still must be performed. For instance,in our complex simulations involving dense granularflow, the number of iterations of the poststabilization
is comparable to the number of iterations for the speedCCP, so computational efforts are almost doubled.
A family of methods can be generated by balancingthe total overhead toward the speed CCP iterations
20 30 40 50 60 70 8020
30
40
50
60
70
80
0.002
0.002
0.004
0.0040.0060.0080.010.012
0.001
post stabilization iterations
speedi
terations
q
WU
Fig. 4. Penetration error: tradeoff between speed iterationsand poststabilization iterations, compared to WU overhead(GPU working units).
or the poststabilization iterations. In the extreme casewhere no iterations for the speed CCP are performedand iterations are performed only for the poststabilization, this method becomes similar to the positionbased dynamics algorithm, proposed in [27] as a robustmethod for realtime simulations.
Note that in this section we did not discuss the caseof collisions with restitution and we assumed the simulation of contacts with fully plastic collisions. Thereader interested in modifications to the method, forsimulating also of the restitution phase, can read [1].
5.3. Parallelization of the algorithm
Algorithm 2 is inherently sequential, so Algorithm 1
can be easier to implement on computational architectures exploiting parallel processing because it does notfeature data dependency for most of the inner loop.
In Algorithm 1, loops at rows 23 and 27 can be easilyexecuted in parallel because they do not need to writeat the same memory address at the same time. In thebest scenario, one could run up to nA+nB independentthreads, each performing an update of its multiplier .
However, the sums at row 30 are more critical if parallelized in the same way, that is, on the basis of athread per contact, since there is the risk that multiple threads will need to update the same element of
the speed vector at the same time (this situation happens, for instance, if two contacts that refer to the samebody are processed simultaneously). This problem canbe solved in many ways, for example, by using a vector of Boolean mutexes, one per rigid body, which canprevent one thread to modify the speed of a rigid bodyif some other thread is writing into it. Otherwise, if thenumber of physical threads is very large as in GPU architectures, it is worth performing the sums in 30 usingparallelreduction algorithms.
We implemented this parallel algorithm on both amulticore processor and on an NVIDIA Tesla C870GPU board featuring a massively multithreaded archi
tecture, thanks to a stream processor that is capable ofprocessing hundreds of threads in parallel. In the latter case, we experienced a speedup of 15 times respectto the serial implementation [44].
12

8/2/2019 Preprint Tasora Anitescu Comp
13/17
6. Examples
We present two benchmarks that we used to test theperformance of our algorithm and an application tothe simulation of the refueling of a nuclear reactor.
6.1. Forklift truck simulator
As a complex mechanical system, a forklift truckrepresents a significative benchmark for our algorithmbecause it entails most aspects that we discussed inthis article: rigid bodies, bilateral constraints, appliedforces, and unilateral contact constraints with friction.In detail, our truck model is made of seven rigid bodies(the frame, three wheels, the steering strut, the mast,the carriage with the forks) connected by six revoluteand prismatic joints. The tilting of the mast, the liftingof the carriage, and the steering are obtained by introducing rheonomic bilateral constraints. This is a sim
plified approach, but more detailed models of the actuators, with hydraulic components and feedback controllers, are also possible within this framework. Motors and brakes provide torques to the wheels; motors,brakes, and actuators can be controlled in real time bythe user, using a joystick or a keyboard, or by usingautomatic procedures.
In order to also test frictional constraints, a seventhrigid body is added: a wooden pallet of EUR/ISO1type: the truck can pick and move such a pallet withthe forks. Specific collision shapes have been defined todetect the contact points between the pallets, the environment, and the truck. Collision shapes have been
used also for frame of the forklift and its overheadguard, so that we can simulate the rollover of the vehicle and other hazardous events.
On our test system, a dualcore Centrino T2600 2.17GHz with 2 GB of RAM, the proposed algorithm is ableto simulate the truck in faster than real time, using afixed timestep h = 0.005 s.
To assess the efficiency and capability of our matrixless approach, we simulated an increasing number offorklift trucks, up to 1,600 vehicles with 1,600 pallets, see Fig.5. In the largest simulation scenario, with12,800 rigid bodies, the algorithm must handle 54,400
bilateral constraints and, on average, 19,000 frictionalcontacts: this means more than 110,000 primal anddual variables.
We used Algorithm 1 with a limit of 20 iterations.Table 1, averaged over 100 steps, shows that the CPUoverheadgrowsalmost linearly with the number of dualvariables 3nA + nB, that is, somehow proportional tothe number of bodies.
Note that the frame rate in the case of hundreds oftrucks, although not real time as for few trucks, is stillfast and interactive.
Increasing the number of iterations results in an improvement of the precision: going from 20 iterations to
80 iterations, the largest errors in constraint positionand in constraint velocity decrease, respectively, from0.0310 mm and 0.024 m/s to 0.006 mm and 0.002 m/s.These results about precision are not dependent on
Table 1Timestep performance.
No. of Trucks CCP Solve [ms] Collision [ms] Bodies 3nA + nB
1 0.22 0.1 8 70
400 135 39 3200 28000
800 268 79 6400 56000
1200 396 130 9600 84000
1600 550 200 12800 112000
the number of trucks, because the convergence of thesolver is not affected by the increasing complexity ofthis type of benchmark, where the dynamics of eachvehicle is uncoupled. On the other hand, our results indicate the efficiency of our algorithms, which is due toour customization to rigidbody dynamics structure.For example, the most efficient offtheshelf algorithmsin [32], which are either of either the interiorpoint orthe projected gradient type, still need around 20 s pertime step for about 10,000 dual variables. Those algo
rithms solve the same problem as here. If linear scalingwould hold for those methods a big assumption intheir favor our algorithm will still be more than 100times faster.
How precision and convergence can be affected bysystems with more stringent topology is investigatedin the following example.
6.2. Dense granular packing benchmark
Dense stacking of granular material is one of thehardest problems involving nonsmooth rigid body dy
namics. Indeed, the benchmark described in this section involves the simulation of the progressive stackingof many convex shapes in an empty box, with differentsettings.
The rigid bodies have a mass m = 10 kg, the moments of inertia are Ixx = Iyy = Izz = 10.24 kgm2,and the friction coefficient is = 0.4. The horizontalsection of the box measures 20 m20 m. We performedtests with different types of colliding shapes, but forthe graphs presented here we used spheres with a radius r = 1.6 m. At the beginning of the simulation thebox is filled by 220 spheres, with randomized positions
and with an initial volume fraction of 0.4, on average.After the spheres have settled, we obtained the plots.Figures 6 and 7 show a typical convergence pattern
of the PGS CCP algorithm: v is the violation of theconstraints at the speed level. For increasing , the iteration shows a faster convergence. However, large values may lead to nonmonotonic behavior (that does notnecessarily lead to divergence). We found that a goodtradeoff between convergence speed and a monotonicnondivergent iteration, for scenarios involving equallysized spheres, is = 1.
Figure 8 shows that the parameter acts like asmooting factor over the iteration. The higher the pa
rameter the slower is the convergence, but the loweris the risk of nonmonotonic or divergent patterns. Theoptimal value depends on the type of simulation; weexperienced that, on average, good default values can
13

8/2/2019 Preprint Tasora Anitescu Comp
14/17
Fig. 5. Benchmark: simulation of thousands of forklifts.
0 10 20 30 40 50 60 70 80
101
iterations

v
=0.2
=1.4
1.0
Fig. 6. Convergence of the residual for varying , for a sampletime step in the 220sphere benchmark.
0 10 20 30 40 50 60 70 80
101
iterations

=0.2
=1.4
1.0
Fig. 7. Convergence of for varying , for a sample timestep in the 220sphere benchmark.
0 10 20 30 40 50 60 70 80
101.6
101.4
101.2
=0.2
=1.0
iterations
v
Fig. 8. Effect of the smoothing parameter , for fixed = 1.
0 5 10 15 200
0.002
0.004
0.006
0.008
0.01
0.012
stack height (n. of spheres)
v

rmax = 20
rmax = 40
rmax = 60
rmax = 80
rmax = 100
Fig. 9. Effect of the vertical size of the stack on the convergence, for varying number of iterations.
0 10 20 30 40 50 60 70 80
101
iterations

v
PGS, =0.2
PGS, =0.4
PGS, =0.6
PGJ, =0.2
PGJ, =0.4
PGJ, =0.6
Fig. 10. Comparison of the convergence of the PGS and PGJalgorithms during the granular stacking benchmark, and example of divergence.
1 2 3 4 5 6 7 8 9 10
x 103
20
30
40
50
60
70
80
0.0
002
0.0002
0.0
002
0.0
004
0.0004
0.0004
0.0006
0.00
06
0.0008
0.00
08
0.00
1
0.00
1
0.00
12
0.00
14
0.00
16
0.00
18
h timestep [s]
iterations
1e
005
1e
005
1e
005
5e
005
5e005
5e
005
0.0
001
0.0001
0.0
001
q
Fig. 11. Penetration error: combined effect of time step andnumber of iterations during the benchmark.
be chosen in the range = 0.8 1.0.The convergencecan be largely affected by the topology of the mechanical system; the bestcase scenariobeing many single objects on a flat plane, and the worst
case being the objects stacked in a vertical row. This isshown in Fig. 9, where we performed simulations withthe same number of spheres but with different sizes ofboxes. The precisionof the iteration, for a fixed number
14

8/2/2019 Preprint Tasora Anitescu Comp
15/17
Fig. 12. Simulation of bidisperse granular flow in a PBR nuclear reactor (170,000 bodies). The inner column of graphitepebbles shows almost no dispersion in the surrounding fissilematerial.
of iterations, deteriorates proportionally to the heightof the stack.
Figure 10 shows that, for low , the convergence ofthe PGJ CCP algorithm is comparable to the convergence of the PGS CCP algorithm. However the PGJCCP method is more likely to run into divergence, es
pecially in the case of redundant constraints. Therefore, it should be used with low , for example = 0.2in this case.
In Fig. 11 we show the results of various simulations with different time steps and iteration numbers,while monitoring the average penetration error in constraints, p. The figure shows that a good precision insatisfying the constraints at the position level can beachieved both by increasing the number of iterationsand by decreasing the time step h.
These plots are obtained for a sample time step,after 5 seconds from the beginning of the granularstacking simulation. Hence, previous time steps and
different scenarios can lead to slightly different graphs.Although this is only a numerical benchmark, it isa worstcase scenario, and its results are indicativeabout the behavior of the solution method when dealing with practical engineering problems that share thesame theoretical difficulties (masonry stability, soilcompaction, etc.).
6.3. Refueling cycle in a pebble bed nuclear reactor
A significant application, which may benefit fromthe robustness and the speed of the method, is the sim
ulation of the granular flow in the pebble bed nuclearreactor PBR [18].
The PBR reactor features a fourthgeneration designbased on a slow recirculation of fuel pebbles in a large
silo: actinides are coated and packed with graphitemoderator in the spherical pebbles, each with a typical diameter of 60 mm, while the helium coolant flowsbetween the pebbles. To increase the efficiency, a central column of spheres could contain only graphite, toflatten the neutron flux. Spheres are slowly extractedfrom the bottom, reprocessed, and reinserted at thetop. The simulation of the downward granular flow
can be useful in estimating statistical parameters suchas the void fraction or the dispersion of the verticalcolumn, hence guiding more efficient designs that canmaximize the burnup of the actinides. A past attemptat simulating a PBR reactor required one week on a64processor supercomputer at Sandia National Laboratories, using the discrete element method (DEM)[34]. Unlike DEM methods, our approach does not introduce stiff force fields, and larger time steps are allowed, so we could perform the simulation on a singlelaptop computer in few hours. On average, this problem involved 170,000 rigid bodies, more than 500,000frictional contacts, leading to more than two millions
of primal and dual variables (Fig. 12).In [42] we presented results of these simulations and
validation against experimental data. Because of thehigh stack of spheres, this example falls in the class ofproblems with slow convergence already discussed inthe previous benchmark: this advocates for future research efforts that could improve the performance ofthe solver by leveraging on multiscale and domain decomposition implementations. Nevertheless, we pointout that even under this circumstance we have shownin [42] that we correctly compute macroscopic parameters of the granular flow in the pebble bed reactor,
such as porosity, while needing only a few hours on alaptop.
7. Conclusions
We presented a formulation for multibody systemswith large amounts of bilateral constraints and frictional contacts, and we developed an iterative methodfor this purpose. Our approach poses the problem as aconvex optimization that can be solved as a cone complementarity problem. The proposed fixedpoint iter
ation has been tailored to feature high performanceeven in large simulation scenarios: special care hasbeen devoted in optimizing algorithms and formulasand avoiding matrix storage, hence obtaining an O(n)space complexity.
The method converges under standard assumptionson the configuration of the system, resulting in a robustalgorithm that can simulate systems with millions ofmultipliers.
Improvements have been presented for the stabilization of contact constraints, resulting in matrixfreeschemes that can correct large interpenetrations ofrigid bodies without running into numerical problems.
We implemented a multibody system based on themethod presented in this paper: the Chrono::Engine library [41]. Aiming at high performance, we expandedsubstantial efforts optimizing the C++ source code.
15

8/2/2019 Preprint Tasora Anitescu Comp
16/17
This software has been already used to simulate complex systems that were hardly tractable with other applications: granular flows in silos, interaction of wheelswith sand and pebbles, size segregation devices, andother problems with a large number of bodies.
The method has been recently ported also on parallelstreamkernel GPU hardware, obtaining a remarkablecomputational efficiency [44].
Acknowledgments
We thank Erwin Coumans for hints about collisiondetection algorithms, and Dan Negrut for commentson an earlier version of the paper. Mihai Anitescu wassupported by Contract DEAC0206CH11357 of theU.S. Department of Energy.
References
[1] M. Anitescu. A fixed timestep approach for multibodydynamics with contact and friction. In 2003 IEEE/RSJInternational Conference on Intelligent Robots andSystems, 2003.(IROS 2003). Proceedings, volume 4, 2003.
[2] M. Anitescu. Optimizationbased simulation of nonsmoothrigid multibody dynamics. Math. Program., 105(1):113143, 2006.
[3] M. Anitescu, J. F. Cremer, and F. A. Potra. Formulating3d contact dynamics problems. Mechanics of Structuresand Machines, 24(4):405437, 1996.
[4] M. Anitescu and G. D. Hart. A constraintstabilizedtimestepping approach for rigid multibody dynamicswith joints, contact and friction. International Journalfor Numerical Methods in Engineering, 60(14):23352371,2004.
[5] M. Anitescu and G. D. Hart. A fixedpoint iterationapproach for multibody dynamics with contact and friction.Mathematical Programming, Series B, 101(1)(ANL/MCSP9850802):332, 2004.
[6] M. Anitescu and F. A. Potra. Formulating dynamic multirigidbody contact problems with friction as solvable linearcomplementarity problems. Nonlinear Dynamics, 14:231247, 1997.
[7] M. Anitescu, F. A. Potra, and D. Stewart. Timesteppingfor threedimensional rigidbody dynamics. ComputerMethods in Applied Mechanics and Engineering, 177:183197, 1999.
[8] M. Anitescu and A. Tasora. An iterative approach forcone complementarity problems for nonsmooth dynamics.Computational Optimization and Applications, pages
Accepted for printing, DOI 10.1007/s1058900892234,2008.
[9] D. Baraff. Issues in computing contact forces for nonpenetrating rigid bodies. Algorithmica, 10:292352, 1993.
[10] D. Baraff. Fast contact force computation fornonpenetrating rigid bodies. In Computer Graphics(Proceedings of SIGGRAPH), pages 2334, 1994.
[11] G. V. D. Bergen and G. J. A. Bergen. Collision Detectionin Interactive 3D Environments. Morgan Kaufmann, 2004.
[12] M. Cline and D. Pai. Poststabilization for rigid bodysimulation with contact and constraints. In IEEEInternational Conference on Robotics and Automation,volume 3, pages 37443751, 2003.
[13] R. Cottle and G. Dantzig. Complementary pivot theoryof mathematical programming. Linear Algebra and Its
Applications, 1:103125, 1968.[14] R. W. Cottle, J.S. Pang, and R. E. Stone. The Linear
Complementarity Problem. Academic Press, Boston, 1992.
[15] B. R. Donald and D. K. Pai. On the motion of compliantlyconnected rigid bodies in contact: a system for analyzing
designs for assembly. In Proceedings of the Conf. onRobotics and Automation, pages 17561762. IEEE, 1990.
[16] S. K. E.G. Gilbert, D.W. Johnson. A fast procedure forcomputing the distance between complex objects in threedimensional space. Robotics and Automation, 4(2):193203,1988.
[17] C. Ericson. RealTime Collision Detection. Elsevier, 2005.
[18] H. D. Gougar. Advanced core design and fuel managementfor pebblebed reactors. Ph.D thesis, Penn State University,Department of Nuclear Engineering, 2004.
[19] G. D. Hart. A Constraintstabilized TimesteppingApproach for Piecewise Smooth Multibody Dynamics.Ph.D thesis, University of Pittsburgh, Department ofMathematics, April 2007.
[20] E. J. Haug. Computer Aided Kinematics and Dynamics ofMechanical Systems. Allyn and Bacon, Boston, 1989.
[21] E. J. Haug, S. Wu, and S. Yang. Dynamic mechanicalsystems with coulomb friction, stiction, impact andconstraint additiondeletion. Mechanisms and MachineTheory, 21(5):407416, 1986.
[22] J.B. HiriartUrruty and C. Lemarechal. Convex Analysisand Minimization Algorithms. Springer Verlag, Berlin,1993.
[23] Y. J. Kim, M. C. Lin, and D. Manocha. Deep: Dualspace expansion for estimating penetration depth betweenconvex polytopes. In Proceedings of the 2002 InternationalConference on Robotics and Automation, volume 1,pages 921926. Institute for Electrical and ElectronicsEngineering, 2002.
[24] P. Lotstedt. Mechanical systems of rigid bodies subjectto unilateral constraints. SIAM Journal of AppliedMathematics, 42(2):281296, 1982.
[25] M. D. P. Marques. Differential Inclusions in NonsmoothMechanical Problems: Shocks and Dry Friction, volume 9of Progress in Nonlinear Differential Equations and TheirApplications. Birkhauser Verlag, Basel, 1993.
[26] J. J. Moreau. Standard inelastic shocks and the dynamicsof unilateral constraints. In G. D. Piero, F. Macieri,and S. Verlag, editors, Unilateral Problems in Structural
Analysis, pages 173221, New York, 1983. CISM Coursesand Lectures no. 288.
[27] M. Muller, B. H. M. Hennix, and J. Ratcliff. Position baseddynamics. In Proceedings of Virtual Reality Interactionsand Physical Simulations, pages 7180, 2006.
[28] K. G. Murty. Linear Complementarity, Linear andNonlinear Programming. Helderman Verlag, Berlin, 1988.
[29] J. Pang and D. Stewart. Solution dependence oninitial conditions in differential variational inequalities.Mathematical Programming, 116(1):429460, 2009.
[30] J.S. Pang, V. Kumar, and P. Song. Convergence of timestepping method for initial and boundaryvalue frictionalcompliant contact problems. SIAM J. Numer. Anal.,43(5):22002226, 2005.
[31] J.S. Pang and J. C. Trinkle. Complementarity
formulations and existence of solutions of dynamic multirigidbody contact problems with coulomb friction. Math.Program., 73(2):199226, 1996.
[32] C. Petra, B. Gavrea, M. Anitescu, and F. Potra. Acomputational study of the use of an optimizationbased method for simulating large multibody systems.Optimization Methods and Software, 24(6):871894, 2009.
[33] F. Pfeiffer and C. Glocker. Multibody Dynamics withUnilateral Contacts. John Wiley, New York City, 1996.
[34] C. Rycroft, G. Grest, J. Landry, and M. Bazant. Analysisof granular flow in a pebblebed nuclear reactor. PhysicalReview E, 74, 021306, 2006.
[35] P. Song, P. Kraus, V. Kumar, and P. Dupont. Analysisof rigidbody dynamic models for simulation of systemswith frictional contacts. Journal of Applied Mechanics,
68(1):118128, 2001.[36] P. Song, J.S. Pang, and V. Kumar. A semiimplicit
timestepping model for frictional compliant contactproblems. International Journal of Numerical Methods inEngineering, 60(13):267279, 2004.
16

8/2/2019 Preprint Tasora Anitescu Comp
17/17
[37] D. Stewart and J.S. Pang. Differential variationalinequalities. Mathematical Programming, 113(2):345424,2008.
[38] D. E. Stewart. Convergence of a timestepping scheme forrigid body dynamics and resolution of Painleves problems.Archive Rational Mechanics and Analysis, 145(3):215260,1998.
[39] D. E. Stewart. Rigidbody dynamics with friction andimpact. SIAM Review, 42(1):339, 2000.
[40] D. E. Stewart and J. C. Trinkle. An implicit timestepping
scheme for rigidbody dynamics with inelastic collisionsand Coulomb friction. International Journal for NumericalMethods in Engineering, 39:26732691, 1996.
[41] A. Tasora. Chrono::Engine project, web page.www.deltaknowledge.com/chronoengine , 2006.
[42] A. Tasora and M. Anitescu. A convex complementarityapproach for simulating large granular flow. Journal ofComputational Nonlinear Dynamics, 2009. To appear.
[43] A. Tasora, E. Manconi, and M. Silvestri. Un nuovometodo del simplesso per il problema di complementaritlineare mista in sistemi multibody con vincoli unilateri. InProceedings of AIMETA 05, Firenze, Italy, 2005.
[44] A. Tasora, D. Negrut, and M. Anitescu. Largescaleparallel multibody dynamics with frictional contact on thegraphical processing unit. Proceedings of the Institution
of Mechanical Engineers, Part K: Journal of MultibodyDynamics, 222(4):315326, 2008.
[45] J. Trinkle, J.S. Pang, S. Sudarsky, and G. Lo. On dynamicmultirigidbody contact problems with Coulomb friction.Zeithschrift fur Angewandte Mathematik und Mechanik,77:267279, 1997.
(To be removed b efore publication) The submitted manuscripthas been created by the University of Chicago as Operator ofArgonne National Laboratory (Argonne) under Contract No.DEAC0206CH11357 with the U.S. Department of Energy. TheU.S. Government retains for itself, and others acting on its behalf, apaidup, nonexclusive, irrevocable worldwide license in said articleto reproduce, prepare derivative works, distribute copies to thepublic, and perform publicly and display publicly, by or on b ehalf
of the Government.