Back to Community

Research paper · Research

FLIP

Real-Time and Resilient Formation Planning for Large-Scale Distributed Swarms via Point Cloud Registration

Yuan Zhou, Guangtong Xu, Zhenyu Hou, Jialiang Hou, and Fei Gao

Read the paper on arXiv

Large drone swarms become harder to coordinate as the number of drones grows. A 10-drone formation can often get away with each drone checking in on many others. A 100-drone formation has far less room for that overhead. If every drone has to reason about every other drone, the system gets slower, needs more communication, and becomes easier to disrupt. One slow planner, one failed agent, or one bad local path can affect the rest of the group.

The paper FLIP: Real-Time and Resilient Formation Planning for Large-Scale Distributed Swarms via Point Cloud Registration treats a drone formation as a shape. The drones are points in space, and the desired formation is another set of points. FLIP compares those two shapes, lines them up, filters out drones that are clearly behaving strangely, and uses the result to decide where each drone should go next.

In more technical terms, the paper treats formation planning as point cloud registration over time. Each drone looks at where the rest of the swarm is expected to be, compares that shape to the desired formation, rejects outliers, creates its own Optimal Formation Position Sequence, and then plans a path toward that sequence.

TL;DR

FLIP turns large-scale formation planning into a shape-matching problem across time. Each drone receives predicted paths from the other drones, checks where they are expected to be in the near future, lines that future swarm shape up with the desired formation, and ignores abnormal agents using RANSAC, a method for fitting a model while filtering out bad data points. The result becomes the drone's own target path.

FLIP plans formations for 100 drones with average per-drone planning time around 0.040 seconds. The paper also demonstrates a 120-drone rocket-shaped formation with average per-agent planning time around 0.09 seconds and formation error 0.6830. When the authors test the shape-matching step alone with 1000 points, it stays under roughly 0.06 seconds. In these experiments, maps and collision checking appear to be heavier parts of the system than the shape-matching step itself.

For drone swarm products, reliability matters more than polished formation demos. A useful planner needs to keep 100+ drones in formation when some drones are late, wrong, disconnected, or behaving abnormally. FLIP focuses on that version of the problem.

The market insight: swarms need resilience more than choreography

Most swarm demos optimize for visual effect: synchronized motion, clean spacing, impressive shape changes. Real products need the swarm to remain useful when the network is imperfect, some drones produce bad paths, some agents lose communication, and a few drones behave abnormally.

The same constraint shows up across drone light shows, warehouse inspection, distributed mapping, search and rescue, agricultural survey, perimeter monitoring, industrial inspection, and eventually robotic construction. In these settings, formation geometry defines sensing coverage, communication layout, safety margins, redundancy, and task decomposition.

There is a meaningful commercial difference between formation as a performance and formation as an operating primitive. A performance can be planned ahead of time. An operating primitive has to work live, on many drones, while conditions are changing. FLIP moves formation control toward that second category.

The paper's main design choice is how it represents the swarm. Describing the swarm as a dense network of relationships between drones makes the math heavy quickly. Simplifying that network too much can make the formation lose its shape. Point cloud registration keeps the whole formation shape in view without forcing every drone to compare itself against every other drone.

The core problem

The goal is to keep the current swarm close to a desired formation while the drones move around obstacles. One way to measure formation error is to compare the current swarm with the desired formation after rotation, movement, and resizing.

e^dist=minR,t,si=1Napides(sRpicur+t)2\hat{e}_{dist} = \min_{R,t,s} \sum_{i=1}^{N_a} \left\|p_i^{des} - \left(s R p_i^{cur} + t\right)\right\|^2

Here:

  • NaN_a is the number of agents.
  • pidesp_i^{des} is the desired formation position of drone ii.
  • picurp_i^{cur} is the current position of drone ii.
  • RSO(3)R \in SO(3) is rotation.
  • tR3t \in \mathbb{R}^3 is translation.
  • sRs \in \mathbb{R} is scale.

The swarm's current positions form one cloud of points. The desired formation positions form another. The planner tries to find the best way to line up those two clouds.

This shifts formation planning toward geometric alignment.

Why graph-based formation planning struggles

Prior formation planners usually take one of three approaches.

First, some methods use simplified forces, probability maps, or virtual structures. These can handle more drones, but the final shape can drift, especially around obstacles.

Second, fully connected methods compare every drone against the whole formation. These preserve quality, but they get expensive as the swarm grows. In the paper's benchmark, Quan's fully connected method reaches mean planning time of 1.023 seconds at only 40 drones, which is too slow for live formation maintenance.

Third, sparse-graph methods reduce the work by only checking selected drone-to-drone links. This helps, but it can weaken the formation shape at large scale. In the paper's benchmark, Zhou's sparse method reaches 0.933 seconds mean planning time at 100 drones and fails to maintain the formation.

FLIP does not make graph-based planning useless. It makes a narrower point: when the main goal is preserving the whole formation shape, describing that shape as a point cloud can be lighter and more direct.

The FLIP method in one sentence

Each drone looks at where the other drones are expected to be, treats those future positions as shapes, lines each shape up with the desired formation, ignores outliers, and uses the result to build its own future target path.

The paper calls that future target path the Optimal Formation Position Sequence, or OFPS.

The distributed planning loop

For each drone ii, the system works roughly like this:

  1. Other drones broadcast their planned trajectories.
  2. Drone ii samples those trajectories over a future time horizon.
  3. At each future timestamp mm, drone ii builds a point cloud from the predicted positions of all other agents.
  4. It registers that point cloud to the desired formation.
  5. It uses the estimated registration transform to compute where it should be at that timestamp.
  6. It optimizes its own trajectory using that generated sequence as a formation constraint.
  7. It broadcasts its new local trajectory back to the swarm.

There is no single central planner computing every path. Each drone does its own planning using the paths it hears from the other drones.

The working approximation

The exact version of the problem asks for the best position pip_i^* for drone ii while considering how every drone affects the overall shape alignment:

FPCR(pi,C)=minR,t,s(jNa{i}pjdes(sRpjcur+t)2+pides(sRpi+t)2)F_{PCR}(p_i, C) = \min_{R,t,s} \left( \sum_{j \in N_a \setminus \{i\}} \left\|p_j^{des} - \left(sR p_j^{cur} + t\right)\right\|^2 + \left\|p_i^{des} - \left(sR p_i + t\right)\right\|^2 \right)

Solving this exactly would be expensive. The planner would have to repeatedly update pip_i, and each update would require another point cloud registration problem. That cost becomes impractical for large swarms.

FLIP uses a practical shortcut. In a large swarm, one drone usually has only a small effect on the overall shape match. Drone ii can ignore its own current contribution and estimate the shape alignment from the other agents:

minR,t,sjNa{i}pjdes(sRpjcur+t)2\min_{R,t,s} \sum_{j \in N_a \setminus \{i\}} \left\|p_j^{des} - \left(sR p_j^{cur} + t\right)\right\|^2

Then it computes its own optimal formation point from the fitted transform:

pi#=s#R#pides+t#p_i^{\#} = s^{\#} R^{\#} p_i^{des} + t^{\#}

This accepts a small loss of exactness to save computation. In a 100-agent swarm, leaving one agent out of the shape match barely changes the overall result, but it avoids expensive repeated solving for that agent's exact best position.

From one position to a future sequence

Formation planning needs more than a target for the current moment. The drone also needs to know where it should be over the next few seconds. FLIP samples the broadcast paths of the other drones across a future window:

Pi,all(m)={pj(m)jNa{i}}P_{i,all}(m) = \{p_j(m) \mid j \in N_a \setminus \{i\}\}

For each future timestamp mm, it solves a registration problem:

minRm,tm,smjNa{i}pjdes(smRmpj(m)+tm)2\min_{R_m,t_m,s_m} \sum_{j \in N_a \setminus \{i\}} \left\|p_j^{des} - \left(s_m R_m p_j(m) + t_m\right)\right\|^2

Then it produces the desired future point for drone ii:

pi,f#(m)=sm#Rm#pides+tm#p_{i,f}^{\#}(m) = s_m^{\#} R_m^{\#} p_i^{des} + t_m^{\#}

Stack those across timestamps and you get the OFPS:

pi,f#={pi,f#(0),,pi,f#(Mc)}p_{i,f}^{\#} = \left\{p_{i,f}^{\#}(0),\dots,p_{i,f}^{\#}(M_c)\right\}

That sequence becomes the target that the drone's local planner tries to follow.

Why RANSAC matters

FLIP pairs shape matching with outlier rejection.

In a real swarm, every drone should not carry equal weight. Some drones may be delayed, stuck, or forced into bad paths around obstacles. If those drones are trusted too much, their bad state can pull the rest of the swarm in the wrong direction.

FLIP uses RANSAC to reject outlier agent positions during registration. It looks for the drones that still agree with the main formation shape and bases the match on them. Drones that are far off become outliers instead of bad inputs that drag the whole system around.

The paper's RANSAC parameters are:

ParameterSymbolValue
Inlier thresholdτran\tau_{ran}0.15
Confidencepranp_{ran}0.99
Maximum iterationskrank_{ran}1000

RANSAC is simple, fast, and well tested. The paper also cites newer robust registration methods, but RANSAC gives a good balance between speed and resilience.

The local trajectory optimization

Once the OFPS is computed, each drone still needs a path it can actually fly. The paper uses TMINCO, a method for creating smooth trajectories with low control effort. The trajectory is optimized over waypoints qq and segment times TT:

minq,Tt0tMp(s)(t)2dt+ρTΣ\min_{q,T} \int_{t_0}^{t_M} \left\|p^{(s)}(t)\right\|^2 dt + \rho T_\Sigma

subject to endpoint conditions, dynamic feasibility, obstacle clearance, reciprocal swarm avoidance, and formation coordination.

The formation coordination term is:

Jf=j=0Mcp(j)pi,f#(j)2J_f = \sum_{j=0}^{M_c} \left\|p(j) - p_{i,f}^{\#}(j)\right\|^2

The term pulls the planned path toward the future formation positions produced by FLIP. The planner also accounts for obstacles, drone limits, collision avoidance, and time/energy goals. The final math problem is solved with L-BFGS, a standard optimization method.

FLIP does more than assign each drone to a slot. It gives each drone a moving target over time, then lets the local planner figure out how to reach that target safely.

The benchmark numbers

The paper compares FLIP against two prior methods: Quan's fully connected method and Zhou's sparse-graph resilient method.

Method20 drones mean time40 drones mean time60 drones mean time80 drones mean time100 drones mean time
Quan's0.055 s1.023 s1.848 s3.053 s5.814 s
Zhou's0.025 s0.038 s0.073 s0.516 s0.933 s
FLIP0.012 s0.014 s0.019 s0.026 s0.040 s

The formation error increases with swarm size, but FLIP remains real-time where the baselines fail or degrade:

Method20 drones error40 drones error60 drones error80 drones error100 drones error
Quan's0.3403N/AN/AN/AN/A
Zhou's0.44220.76200.98701.9631N/A
FLIP0.36800.60650.73550.93521.2580

The authors also demonstrate a 120-drone rocket-shaped formation in an obstacle environment with:

Na=120,tmean0.09s,eˉdist=0.6830N_a = 120, \quad t_{mean} \approx 0.09s, \quad \bar{e}_{dist} = 0.6830

The main systems result is a real-time planning loop for 100+ simulated agents.

Resilience result: handling abnormal agents

The resilience experiment tests formation planning under different numbers of outlier agents. Quan's method has no abnormal-agent handling, so even a small number of abnormal agents can propagate bad behavior through the cooperative network. Zhou's method removes abnormal agents before trajectory planning, which helps at low outlier counts but struggles as the number of outliers increases.

FLIP rejects abnormal agents during the spatiotemporal PCR step, so outlier rejection happens across the future formation sequence instead of as a one-time preprocessing step.

The paper reports that even with roughly 12% abnormal agents, the remaining normal agents keep formation performance below 0.5 formation error. The result suggests that the swarm can keep operating while partially degraded, instead of only working in clean conditions.

The slender-formation insight

One subtle result is that FLIP performs better on elongated formations than graph Laplacian methods. In normalized Laplacian representations, the long axis of a slender formation can dominate the shape descriptor. That can make the planner less sensitive to short-axis deformation.

FLIP avoids this because it treats the desired formation directly as a point cloud rather than a graph Laplacian shape. In the paper's elongated rectangle experiment, FLIP achieves:

eˉdist=0.028\bar{e}_{dist} = 0.028

in obstacle-free conditions, compared with:

eˉdist=0.236\bar{e}_{dist} = 0.236

for Quan's method. With obstacles, FLIP reports:

eˉdist=0.153\bar{e}_{dist} = 0.153

compared with:

eˉdist=0.394\bar{e}_{dist} = 0.394

for Quan's method.

Some formation shapes are poorly served by graph normalization. A direct geometric representation can preserve sensitivity across axes.

What is actually new here?

The individual ingredients are familiar. Distributed trajectory planning, point cloud registration, RANSAC, polynomial trajectory optimization, and L-BFGS all have long histories.

The novelty is the composition:

Formation planningspatiotemporal PCRRANSAC outlier rejectionlocal trajectory optimization\text{Formation planning} \rightarrow \text{spatiotemporal PCR} \rightarrow \text{RANSAC outlier rejection} \rightarrow \text{local trajectory optimization}

The pipeline turns swarm coordination into a sequence of mature estimation and optimization problems. Instead of inventing an entirely new formation controller, the authors use a known geometric primitive: align two point clouds robustly.

Limits of the result

FLIP is a planning architecture, not a learned swarm policy, neural world model, end-to-end autonomy system, or complete real-world deployment paper.

The experiments are simulation-based. The benchmark machine is an Intel i7-8700K CPU with 32 GB RAM. The authors also note that some computation in simulation comes from maintaining local maps for all agents, which would not necessarily exist in the same way on deployed robots. Communication bandwidth also remains a major issue, and the paper's future work explicitly points toward group-wise communication and global path planning.

FLIP offers a planning architecture for large-scale formation maintenance. A full production swarm stack would still need real hardware validation, communication design, safety handling, and integration work.

Why this matters for drone autonomy products

Multi-drone autonomy depends heavily on how the system represents the group.

A drone swarm operating system needs primitives like:

  • maintain coverage geometry,
  • preserve sensing baselines,
  • reject failed agents,
  • reroute around obstacles,
  • distribute computation,
  • operate with partial communication,
  • scale from 10 agents to 100+ agents without redesign.

FLIP gives one candidate primitive: formation as robust point cloud registration.

This is relevant for drone developer platforms. Developers should be able to declare a formation, task, or coverage geometry without manually engineering pairwise constraints for every new swarm shape. A PCR-based formation layer could become the bridge between high-level intent and low-level trajectory optimization.

In product terms, the abstraction would look like:

formation = PointCloudFormation(shape="inspection_wall", agents=120)
swarm.track_formation(formation, reject_outliers=True, avoid_obstacles=True)

Underneath, the system would repeatedly register the actual swarm state to the desired formation, reject outliers, generate per-agent OFPS targets, and solve local trajectories.

A primitive like this would make swarm autonomy easier to reuse across applications.

Research context

The DroneForge research section collects practical notes for builders who want to connect drone autonomy ideas to real hardware. Topics may include perception, tracking, mission planning, route replay, benchmarks, datasets, and lessons from operating Nimbus with DF1 in repeatable field workflows.

These notes are written for developers who need more than abstract robotics theory. The goal is to connect papers, experiments, and field observations to concrete Nimbus App and Python Library workflows that can be tested with video, telemetry, commands, and route planning tools.

As this section grows, each research entry will point builders toward the assumptions, constraints, and practical tradeoffs behind real autonomy experiments. That context helps teams decide what to prototype, what to measure, and how to evaluate progress.

Community archive

Continue exploring DroneForge changelogs, research notes, and Nimbus examples through the community archive. These internal links help connect related releases, technical notes, and builder resources.