Persistence Diagram Vectorization
PersistenceDiagrams.PersistenceImage
— TypePersistenceImage
PersistenceImage
provides a vectorization method for persistence diagrams. Each point in the diagram is first transformed into birth
, persistence
coordinates. Then, it is weighted by a weighting function and widened by a distribution (default: gaussian with σ=1). Once all the points are transformed, their distributions are summed together and discretized into an image.
The weighting ensures points near the diagonal have a small contribution. This ensures this representation of the diagram is stable.
Once a PersistenceImage
is constructed (see below), it can called like a function to transform a diagram to an image.
Infinite intervals in the diagram are ignored.
Constructors
PersistenceImage(ylims, xlims; kwargs...)
Create an image ranging from ylims[1]
to ylims[2]
in the $y$ direction and equivalently for the $x$ direction.
PersistenceImage(diagrams; zero_start=true, margin=0.1, kwargs...)
Learn the $x$ and $y$ ranges from diagrams, ensuring all diagrams will fully fit in the image. Limits are increased by the margin
. If zero_start
is true, set the minimum y
value to 0. If all intervals in diagrams have the same birth (e.g. in the zeroth dimension), a single column image is produced.
Keyword Arguments
size
: integer or tuple of two integers. Determines the size of the array containing the image. Defaults to 5.distribution
: A function or callable object used to smear each interval in diagram. Has to be callable with twoFloat64
s as input and should return aFloat64
. Defaults to a normal distribution.sigma
: The width of the normal distribution mentioned above. Only applicable whendistribution
is unset. Defaults to twice the size of each pixel.weight
: A function or callable object used as the weighting function. Has to be callable with twoFloat64
s as input and should return aFloat64
. Should equal 0.0 for x=0, but this is not enforced. Defaults to function that is zero at $y=0$, and increases linearly to 1 untilslope_end
is reached.slope_end
: the relative $y$ value at which the default weight function stops increasing. Defaults to 1.0.
Example
julia> diag_1 = PersistenceDiagram([(0, 1), (0, 1.5), (1, 2)]);
julia> diag_2 = PersistenceDiagram([(1, 2), (1, 1.5)]);
julia> image = PersistenceImage([diag_1, diag_2])
5×5 PersistenceImage(
distribution = PersistenceDiagrams.Binormal(0.5499999999999999),
weight = PersistenceDiagrams.DefaultWeightingFunction(1.65),
)
julia> image(diag_1)
5×5 Matrix{Float64}:
0.156707 0.164263 0.160452 0.149968 0.133353
0.344223 0.355089 0.338991 0.308795 0.268592
0.571181 0.577527 0.535069 0.47036 0.396099
0.723147 0.714873 0.639138 0.536823 0.432264
0.700791 0.677237 0.582904 0.46433 0.352962
Reference
Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., ... & Ziegelmeier, L. (2017). Persistence images: A stable vector representation of persistent homology. The Journal of Machine Learning Research, 18(1), 218-252.
PersistenceDiagrams.PersistenceCurve
— TypePersistenceCurve
Persistence curves offer a general way to transform a persistence diagram into a vector of numbers.
This is done by first splitting the time domain into buckets. Then the intervals contained in the bucket are collected and transformed by applying fun
to each of them. The result is then summarized with the stat
function. If an interval is only parially contained in a bucket, it is counted partially.
Once a PersistenceCurve
is constructed (see below), it can be called to convert a persistence diagram to a vector of floats.
Constructors
PersistenceCurve(fun, stat, start, stop; length=10, integrate=true, normalize=false)
:length
buckets with the first strating ont_start
and the last ending ont_end
.PersistenceCurve(fun, stat, diagrams; length=10, integreate=true, normalize=false)
: learn thestart
andstop
parameters from a collection of persistence diagrams.
Arguments
length
: the length of the output. Defaults to 10.fun
: the function applied to each interval. Must have the following signature.fun(::AbstractPersistenceInterval, ::PersistenceDiagram, time)::T
stat
: the summary function applied the results offun
. Must have the following signature.stat(::Vector{T})::Float64
normalize
: if set totrue
, normalize the result. Does not work for time-dependentfun
s. Defaults tofalse
. Normalization is performed by dividing all values bystat(fun.(diag))
.integrate
: if set totrue
, the amount of overlap between an interval and a bucket is considered. This prevents missing very small bars, but does not work correctly for curves with time-dependentfun
s wherestat
is a selection function (such as landscapes). If set tofalse
, the curve is simply sampled at midpoints of buckets. Defaults totrue
.
Call
(::PersistenceCurve)(diagram; normalize, integrate)
Transforms a diagram. normalize
and integrate
override defaults set in constructor.
Example
julia> diagram = PersistenceDiagram([(0, 1), (0.5, 1), (0.5, 0.6), (1, 1.5), (0.5, Inf)]);
julia> curve = BettiCurve(0, 2, length = 4)
PersistenceCurve(always_one, sum, 0.0, 2.0; length=4, normalize=false, integrate=true)
julia> curve(diagram)
4-element Vector{Float64}:
1.0
3.2
2.0
1.0
See Also
The following are equivalent to PersistenceCurve
with appropriately selected fun
and stat
arguments.
More options listed in Table 1 on page 9 of reference.
Reference
Chung, Y. M., & Lawson, A. (2019). Persistence curves: A canonical framework for summarizing persistence diagrams. arXiv preprint arXiv:1904.07768.
PersistenceDiagrams.BettiCurve
— FunctionBettiCurve
Betti curves count the Betti numbers at each time step. Unlike most vectorization methods, they support infinite intervals.
fun(_, _, _) = 1.0
stat = sum
See also
PersistenceDiagrams.Life
— FunctionLife
The life curve.
fun((b, d), _, _) = d - b
stat = sum
See also
Reference
Chung, Y. M., & Lawson, A. (2019). Persistence curves: A canonical framework for summarizing persistence diagrams. arXiv preprint arXiv:1904.07768.
PersistenceDiagrams.Midlife
— FunctionMidlife
The midlife curve.
fun((b, d), _, _) = (b + d) / 2
stat = sum
See also
Reference
Chung, Y. M., & Lawson, A. (2019). Persistence curves: A canonical framework for summarizing persistence diagrams. arXiv preprint arXiv:1904.07768.
PersistenceDiagrams.LifeEntropy
— FunctionLifeEntropy
The life entropy curve.
fun((b, d), diag, _) = begin
x = (d - b) / sum(d - b for (b, d) in diag)
-x * log2(x)
end
stat = sum
See also
Reference
Atienza, N., González-Díaz, R., & Soriano-Trigueros, M. (2018). On the stability of persistent entropy and new summary functions for TDA. arXiv preprint arXiv:1803.08304.
PersistenceDiagrams.MidlifeEntropy
— FunctionMidlifeEntropy
The midlife entropy curve.
fun((b, d), diag, _) = begin
x = (b + d) / sum(b + d for (d, b) in diag)
-x * log2(x)
end
stat = sum
See also
Reference
Chung, Y. M., & Lawson, A. (2019). Persistence curves: A canonical framework for summarizing persistence diagrams. arXiv preprint arXiv:1904.07768.
PersistenceDiagrams.PDThresholding
— FunctionPDThresholding
The persistence diagram thresholding function.
fun((b, d), _, t) = (d - t) * (t - b)
stat = mean
See also
Reference
Chung, Y. M., & Day, S. (2018). Topological fidelity and image thresholding: A persistent homology approach. Journal of Mathematical Imaging and Vision, 60(7), 1167-1179.
PersistenceDiagrams.Landscapes
— TypeLandscapes(n, args...)
The first n
persistence landscapes.
fun((b, d), _, t) = max(min(t - b, d - t), 0)
stat = get(sort(values, rev=true), k, 0.0)
Vectorizes to a matrix where each column is a landscape.
See also
Reference
Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. The Journal of Machine Learning Research, 16(1), 77-102.
PersistenceDiagrams.Landscape
— FunctionLandscape(k, args...)
The k
-th persistence landscape.
fun((b, d), _, t) = max(min(t - b, d - t), 0)
stat = get(sort(values, rev=true), k, 0.0)
See also
Reference
Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. The Journal of Machine Learning Research, 16(1), 77-102.
PersistenceDiagrams.Silhuette
— FunctionSilhuette
The sum of persistence landscapes for all values of k
.
fun((b, d), _, t) = max(min(t - b, d - t), 0)
stat = sum
See also