Persistence Diagram Vectorization

PersistenceDiagrams.PersistenceImage — Type

PersistenceImage

PersistenceImage provides a vectorization method for persistence diagrams. Each point in the diagram is first transformed into birth, persistence coordinates. Then, it is weighted by a weighting function and widened by a distribution (default: gaussian with σ=1). Once all the points are transformed, their distributions are summed together and discretized into an image.

The weighting ensures points near the diagonal have a small contribution. This ensures this representation of the diagram is stable.

Once a PersistenceImage is constructed (see below), it can called like a function to transform a diagram to an image.

Infinite intervals in the diagram are ignored.

Constructors

PersistenceImage(ylims, xlims; kwargs...)

Create an image ranging from ylims[1] to ylims[2] in the $y$ direction and equivalently for the $x$ direction.

PersistenceImage(diagrams; zero_start=true, margin=0.1, kwargs...)

Learn the $x$ and $y$ ranges from diagrams, ensuring all diagrams will fully fit in the image. Limits are increased by the margin. If zero_start is true, set the minimum y value to 0. If all intervals in diagrams have the same birth (e.g. in the zeroth dimension), a single column image is produced.

Keyword Arguments

size: integer or tuple of two integers. Determines the size of the array containing the image. Defaults to 5.
distribution: A function or callable object used to smear each interval in diagram. Has to be callable with two Float64s as input and should return a Float64. Defaults to a normal distribution.
sigma: The width of the normal distribution mentioned above. Only applicable when distribution is unset. Defaults to twice the size of each pixel.
weight: A function or callable object used as the weighting function. Has to be callable with two Float64s as input and should return a Float64. Should equal 0.0 for x=0, but this is not enforced. Defaults to function that is zero at $y=0$, and increases linearly to 1 until slope_end is reached.
slope_end: the relative $y$ value at which the default weight function stops increasing. Defaults to 1.0.

Example

julia> diag_1 = PersistenceDiagram([(0, 1), (0, 1.5), (1, 2)]);

julia> diag_2 = PersistenceDiagram([(1, 2), (1, 1.5)]);

julia> image = PersistenceImage([diag_1, diag_2])
5×5 PersistenceImage(
  distribution = PersistenceDiagrams.Binormal(0.5499999999999999),
  weight = PersistenceDiagrams.DefaultWeightingFunction(1.65),
)

julia> image(diag_1)
5×5 Matrix{Float64}:
 0.156707  0.164263  0.160452  0.149968  0.133353
 0.344223  0.355089  0.338991  0.308795  0.268592
 0.571181  0.577527  0.535069  0.47036   0.396099
 0.723147  0.714873  0.639138  0.536823  0.432264
 0.700791  0.677237  0.582904  0.46433   0.352962

Reference

Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., ... & Ziegelmeier, L. (2017). Persistence images: A stable vector representation of persistent homology. The Journal of Machine Learning Research, 18(1), 218-252.

source

PersistenceDiagrams.PersistenceCurve — Type

PersistenceCurve

Persistence curves offer a general way to transform a persistence diagram into a vector of numbers.

This is done by first splitting the time domain into buckets. Then the intervals contained in the bucket are collected and transformed by applying fun to each of them. The result is then summarized with the stat function. If an interval is only parially contained in a bucket, it is counted partially.

Once a PersistenceCurve is constructed (see below), it can be called to convert a persistence diagram to a vector of floats.

Constructors

PersistenceCurve(fun, stat, start, stop; length=10, integrate=true, normalize=false): length buckets with the first strating on t_start and the last ending on t_end.
PersistenceCurve(fun, stat, diagrams; length=10, integreate=true, normalize=false): learn the start and stop parameters from a collection of persistence diagrams.

Arguments

length: the length of the output. Defaults to 10.
fun: the function applied to each interval. Must have the following signature. fun(::AbstractPersistenceInterval, ::PersistenceDiagram, time)::T
stat: the summary function applied the results of fun. Must have the following signature. stat(::Vector{T})::Float64
normalize: if set to true, normalize the result. Does not work for time-dependent funs. Defaults to false. Normalization is performed by dividing all values by stat(fun.(diag)).
integrate: if set to true, the amount of overlap between an interval and a bucket is considered. This prevents missing very small bars, but does not work correctly for curves with time-dependent funs where stat is a selection function (such as landscapes). If set to false, the curve is simply sampled at midpoints of buckets. Defaults to true.

Call

(::PersistenceCurve)(diagram; normalize, integrate)

Transforms a diagram. normalize and integrate override defaults set in constructor.

Example

julia> diagram = PersistenceDiagram([(0, 1), (0.5, 1), (0.5, 0.6), (1, 1.5), (0.5, Inf)]);

julia> curve = BettiCurve(0, 2, length = 4)
PersistenceCurve(always_one, sum, 0.0, 2.0; length=4, normalize=false, integrate=true)

julia> curve(diagram)
4-element Vector{Float64}:
 1.0
 3.2
 2.0
 1.0

See Also

The following are equivalent to PersistenceCurve with appropriately selected fun and stat arguments.

BettiCurve
Landscape
Silhuette
Life
Midlife
LifeEntropy
MidlifeEntropy
PDThresholding

More options listed in Table 1 on page 9 of reference.

Reference

Chung, Y. M., & Lawson, A. (2019). Persistence curves: A canonical framework for summarizing persistence diagrams. arXiv preprint arXiv:1904.07768.

source

PersistenceDiagrams.BettiCurve — Function

BettiCurve

Betti curves count the Betti numbers at each time step. Unlike most vectorization methods, they support infinite intervals.

fun(_, _, _) = 1.0
stat = sum

See also

PersistenceCurve

source

PersistenceDiagrams.Life — Function

Life

The life curve.

fun((b, d), _, _) = d - b
stat = sum

See also

PersistenceCurve

Reference

Chung, Y. M., & Lawson, A. (2019). Persistence curves: A canonical framework for summarizing persistence diagrams. arXiv preprint arXiv:1904.07768.

source

PersistenceDiagrams.Midlife — Function

Midlife

The midlife curve.

fun((b, d), _, _) = (b + d) / 2
stat = sum

See also

PersistenceCurve

Reference

Chung, Y. M., & Lawson, A. (2019). Persistence curves: A canonical framework for summarizing persistence diagrams. arXiv preprint arXiv:1904.07768.

source

PersistenceDiagrams.LifeEntropy — Function

LifeEntropy

The life entropy curve.

fun((b, d), diag, _) = begin
    x = (d - b) / sum(d - b for (b, d) in diag)
    -x * log2(x)
end
stat = sum

See also

PersistenceCurve

Reference

Atienza, N., González-Díaz, R., & Soriano-Trigueros, M. (2018). On the stability of persistent entropy and new summary functions for TDA. arXiv preprint arXiv:1803.08304.

source

PersistenceDiagrams.MidlifeEntropy — Function

MidlifeEntropy

The midlife entropy curve.

fun((b, d), diag, _) = begin
    x = (b + d) / sum(b + d for (d, b) in diag)
    -x * log2(x)
end
stat = sum

See also

PersistenceCurve

Reference

Chung, Y. M., & Lawson, A. (2019). Persistence curves: A canonical framework for summarizing persistence diagrams. arXiv preprint arXiv:1904.07768.

source

PersistenceDiagrams.PDThresholding — Function

PDThresholding

The persistence diagram thresholding function.

fun((b, d), _, t) = (d - t) * (t - b)
stat = mean

See also

PersistenceCurve

Reference

Chung, Y. M., & Day, S. (2018). Topological fidelity and image thresholding: A persistent homology approach. Journal of Mathematical Imaging and Vision, 60(7), 1167-1179.

source

PersistenceDiagrams.Landscapes — Type

Landscapes(n, args...)

The first n persistence landscapes.

fun((b, d), _, t) = max(min(t - b, d - t), 0)
stat = get(sort(values, rev=true), k, 0.0)

Vectorizes to a matrix where each column is a landscape.

See also

PersistenceCurve
Landscape

Reference

Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. The Journal of Machine Learning Research, 16(1), 77-102.

source

PersistenceDiagrams.Landscape — Function

Landscape(k, args...)

The k-th persistence landscape.

fun((b, d), _, t) = max(min(t - b, d - t), 0)
stat = get(sort(values, rev=true), k, 0.0)

See also

PersistenceCurve
Landscapes

Reference

Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. The Journal of Machine Learning Research, 16(1), 77-102.

source

PersistenceDiagrams.Silhuette — Function

Silhuette

The sum of persistence landscapes for all values of k.

fun((b, d), _, t) = max(min(t - b, d - t), 0)
stat = sum

See also

PersistenceCurve
Landscape
Landscapes

source