Tensor reshaping
In multilinear algebra, a reshaping of tensors is any bijection between the set of indices of an order- M {\displaystyle M} tensor and the set of indices of an order- L {\displaystyle L} tensor, where L < M {\displaystyle L<M} . The use of indices presupposes tensors in coordinate representation with respect to a basis. The coordinate representation of a tensor can be regarded as a multi-dimensional array, and a bijection from one set of indices to another therefore amounts to a rearrangement of the array elements into an array of a different shape. Such a rearrangement constitutes a particular kind of linear map between the vector space of order- M {\displaystyle M} tensors and the vector space of order- L {\displaystyle L} tensors.
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Tensor reshaping" – news · newspapers · books · scholar · JSTOR (April 2023) (Learn how and when to remove this message) |
In multilinear algebra, a reshaping of tensors is any bijection between the set of indices of an order-
M
{\displaystyle M}
tensor and the set of indices of an order-
L
{\displaystyle L}
tensor, where
L
<
M
{\displaystyle L<M}
. The use of indices presupposes tensors in coordinate representation with respect to a basis. The coordinate representation of a tensor can be regarded as a multi-dimensional array, and a bijection from one set of indices to another therefore amounts to a rearrangement of the array elements into an array of a different shape. Such a rearrangement constitutes a particular kind of linear map between the vector space of order-
M
{\displaystyle M}
tensors and the vector space of order-
L
{\displaystyle L}
tensors.
Definition
[edit]Given a positive integer
M
{\displaystyle M}
, the notation
[
M
]
{\displaystyle [M]}
refers to the set
{
1
,
…
,
M
}
{\displaystyle \{1,\dots ,M\}}
of the first M positive integers.
For each integer
m
{\displaystyle m}
where
1
≤
m
≤
M
{\displaystyle 1\leq m\leq M}
for a positive integer
M
{\displaystyle M}
, let
V
m
{\displaystyle V_{m}}
denote an
I
m
{\displaystyle I_{m}}
-dimensional vector space over a field
F
{\displaystyle F}
. Then there are vector space isomorphisms (linear maps)
V
1
⊗
⋯
⊗
V
M
≃
F
I
1
⊗
⋯
⊗
F
I
M
≃
F
I
π
1
⊗
⋯
⊗
F
I
π
M
≃
F
I
π
1
I
π
2
⊗
F
I
π
3
⊗
⋯
⊗
F
I
π
M
≃
F
I
π
1
I
π
3
⊗
F
I
π
2
⊗
F
I
π
4
⊗
⋯
⊗
F
I
π
M
⋮
≃
F
I
1
I
2
…
I
M
,
{\displaystyle {\begin{aligned}V_{1}\otimes \cdots \otimes V_{M}&\simeq F^{I_{1}}\otimes \cdots \otimes F^{I_{M}}\\&\simeq F^{I_{\pi _{1}}}\otimes \cdots \otimes F^{I_{\pi _{M}}}\\&\simeq F^{I_{\pi _{1}}I_{\pi _{2}}}\otimes F^{I_{\pi _{3}}}\otimes \cdots \otimes F^{I_{\pi _{M}}}\\&\simeq F^{I_{\pi _{1}}I_{\pi _{3}}}\otimes F^{I_{\pi _{2}}}\otimes F^{I_{\pi _{4}}}\otimes \cdots \otimes F^{I_{\pi _{M}}}\\&\,\,\,\vdots \\&\simeq F^{I_{1}I_{2}\ldots I_{M}},\end{aligned}}}
where
π
∈
S
M
{\displaystyle \pi \in {\mathfrak {S}}_{M}}
is any permutation and
S
M
{\displaystyle {\mathfrak {S}}_{M}}
is the symmetric group on
M
{\displaystyle M}
elements. Via these (and other) vector space isomorphisms, a tensor can be interpreted in several ways as an order-
L
{\displaystyle L}
tensor where
L
≤
M
{\displaystyle L\leq M}
.
Coordinate representation
[edit]The first vector space isomorphism on the list above,
V
1
⊗
⋯
⊗
V
M
≃
F
I
1
⊗
⋯
⊗
F
I
M
{\displaystyle V_{1}\otimes \cdots \otimes V_{M}\simeq F^{I_{1}}\otimes \cdots \otimes F^{I_{M}}}
, gives the coordinate representation of an abstract tensor. Assume that each of the
M
{\displaystyle M}
vector spaces
V
m
{\displaystyle V_{m}}
has a basis
{
v
1
m
,
v
2
m
,
…
,
v
I
m
m
}
{\displaystyle \{v_{1}^{m},v_{2}^{m},\ldots ,v_{I_{m}}^{m}\}}
. The expression of a tensor with respect to this basis has the form
A
=
∑
i
1
=
1
I
1
…
∑
i
M
=
1
I
M
a
i
1
,
i
2
,
…
,
i
M
v
i
1
1
⊗
v
i
2
2
⊗
⋯
⊗
v
i
M
M
,
{\displaystyle {\mathcal {A}}=\sum _{i_{1}=1}^{I_{1}}\ldots \sum _{i_{M}=1}^{I_{M}}a_{i_{1},i_{2},\ldots ,i_{M}}v_{i_{1}}^{1}\otimes v_{i_{2}}^{2}\otimes \cdots \otimes v_{i_{M}}^{M},}
where the coefficients
a
i
1
,
i
2
,
…
,
i
M
{\displaystyle a_{i_{1},i_{2},\ldots ,i_{M}}}
are elements of
F
{\displaystyle F}
. The coordinate representation of
A
{\displaystyle {\mathcal {A}}}
is
∑
i
1
=
1
I
1
…
∑
i
M
=
1
I
M
a
i
1
,
i
2
,
…
,
i
M
e
i
1
1
⊗
e
i
2
2
⊗
⋯
⊗
e
i
M
M
,
{\displaystyle \sum _{i_{1}=1}^{I_{1}}\ldots \sum _{i_{M}=1}^{I_{M}}a_{i_{1},i_{2},\ldots ,i_{M}}\mathbf {e} _{i_{1}}^{1}\otimes \mathbf {e} _{i_{2}}^{2}\otimes \cdots \otimes \mathbf {e} _{i_{M}}^{M},}
where
e
i
m
{\displaystyle \mathbf {e} _{i}^{m}}
is the
i
th
{\displaystyle i^{\text{th}}}
standard basis vector of
F
I
m
{\displaystyle F^{I_{m}}}
. This can be regarded as a M-way array whose elements are the coefficients
a
i
1
,
i
2
,
…
,
i
M
{\displaystyle a_{i_{1},i_{2},\ldots ,i_{M}}}
.
General flattenings
[edit]For any permutation
π
∈
S
M
{\displaystyle \pi \in {\mathfrak {S}}_{M}}
there is a canonical isomorphism between the two tensor products of vector spaces
V
1
⊗
V
2
⊗
⋯
⊗
V
M
{\displaystyle V_{1}\otimes V_{2}\otimes \cdots \otimes V_{M}}
and
V
π
(
1
)
⊗
V
π
(
2
)
⊗
⋯
⊗
V
π
(
M
)
{\displaystyle V_{\pi (1)}\otimes V_{\pi (2)}\otimes \cdots \otimes V_{\pi (M)}}
. Parentheses are usually omitted from such products due to the natural isomorphism between
V
i
⊗
(
V
j
⊗
V
k
)
{\displaystyle V_{i}\otimes (V_{j}\otimes V_{k})}
and
(
V
i
⊗
V
j
)
⊗
V
k
{\displaystyle (V_{i}\otimes V_{j})\otimes V_{k}}
, but may, of course, be reintroduced to emphasize a particular grouping of factors. In the grouping,
(
V
π
(
1
)
⊗
⋯
⊗
V
π
(
r
1
)
)
⊗
(
V
π
(
r
1
+
1
)
⊗
⋯
⊗
V
π
(
r
2
)
)
⊗
⋯
⊗
(
V
π
(
r
L
−
1
+
1
)
⊗
⋯
⊗
V
π
(
r
L
)
)
,
{\displaystyle (V_{\pi (1)}\otimes \cdots \otimes V_{\pi (r_{1})})\otimes (V_{\pi (r_{1}+1)}\otimes \cdots \otimes V_{\pi (r_{2})})\otimes \cdots \otimes (V_{\pi (r_{L-1}+1)}\otimes \cdots \otimes V_{\pi (r_{L})}),}
there are
L
{\displaystyle L}
groups with
r
l
−
r
l
−
1
{\displaystyle r_{l}-r_{l-1}}
factors in the
l
th
{\displaystyle l^{\text{th}}}
group (where
r
0
=
0
{\displaystyle r_{0}=0}
and
r
L
=
M
{\displaystyle r_{L}=M}
).
Letting
S
l
=
(
π
(
r
l
−
1
+
1
)
,
π
(
r
l
−
1
+
2
)
,
…
,
π
(
r
l
)
)
{\displaystyle S_{l}=(\pi (r_{l-1}+1),\pi (r_{l-1}+2),\ldots ,\pi (r_{l}))}
for each
l
{\displaystyle l}
satisfying
1
≤
l
≤
L
{\displaystyle 1\leq l\leq L}
, an
(
S
1
,
S
2
,
…
,
S
L
)
{\displaystyle (S_{1},S_{2},\ldots ,S_{L})}
-flattening of a tensor
A
{\displaystyle {\mathcal {A}}}
, denoted
A
(
S
1
,
S
2
,
…
,
S
L
)
{\displaystyle {\mathcal {A}}_{(S_{1},S_{2},\ldots ,S_{L})}}
, is obtained by applying the two processes above within each of the
L
{\displaystyle L}
groups of factors. That is, the coordinate representation of the
l
th
{\displaystyle l^{\text{th}}}
group of factors is obtained using the isomorphism
(
V
π
(
r
l
−
1
+
1
)
⊗
V
π
(
r
l
−
1
+
2
)
⊗
⋯
⊗
V
π
(
r
l
)
)
≃
(
F
I
π
(
r
l
−
1
+
1
)
⊗
F
I
π
(
r
l
−
1
+
2
)
⊗
⋯
⊗
F
I
π
(
r
l
)
)
{\displaystyle (V_{\pi (r_{l-1}+1)}\otimes V_{\pi (r_{l-1}+2)}\otimes \cdots \otimes V_{\pi (r_{l})})\simeq (F^{I_{\pi (r_{l-1}+1)}}\otimes F^{I_{\pi (r_{l-1}+2)}}\otimes \cdots \otimes F^{I_{\pi (r_{l})}})}
, which requires specifying bases for all of the vector spaces
V
k
{\displaystyle V_{k}}
. The result is then vectorized using a bijection
μ
l
:
[
I
π
(
r
l
−
1
+
1
)
]
×
[
I
π
(
r
l
−
1
+
2
)
]
×
⋯
×
[
I
π
(
r
l
)
]
→
[
I
S
l
]
{\displaystyle \mu _{l}:[I_{\pi (r_{l-1}+1)}]\times [I_{\pi (r_{l-1}+2)}]\times \cdots \times [I_{\pi (r_{l})}]\to [I_{S_{l}}]}
to obtain an element of
F
I
S
l
{\displaystyle F^{I_{S_{l}}}}
, where
I
S
l
:=
∏
i
=
r
l
−
1
+
1
r
l
I
π
(
i
)
{\textstyle I_{S_{l}}:=\prod _{i=r_{l-1}+1}^{r_{l}}I_{\pi (i)}}
, the product of the dimensions of the vector spaces in the
l
th
{\displaystyle l^{\text{th}}}
group of factors. The result of applying these isomorphisms within each group of factors is an element of
F
I
S
1
⊗
⋯
⊗
F
I
S
L
{\displaystyle F^{I_{S_{1}}}\otimes \cdots \otimes F^{I_{S_{L}}}}
, which is a tensor of order
L
{\displaystyle L}
.
Vectorization
[edit]By means of a bijective map
μ
:
[
I
1
]
×
⋯
×
[
I
M
]
→
[
I
1
⋯
I
M
]
{\displaystyle \mu :[I_{1}]\times \cdots \times [I_{M}]\to [I_{1}\cdots I_{M}]}
, a vector space isomorphism between
F
I
1
⊗
⋯
⊗
F
I
M
{\displaystyle F^{I_{1}}\otimes \cdots \otimes F^{I_{M}}}
and
F
I
1
⋯
I
M
{\displaystyle F^{I_{1}\cdots I_{M}}}
is constructed via the mapping
e
i
1
1
⊗
⋯
e
i
m
m
⊗
⋯
⊗
e
i
M
M
↦
e
μ
(
i
1
,
i
2
,
…
,
i
M
)
,
{\displaystyle \mathbf {e} _{i_{1}}^{1}\otimes \cdots \mathbf {e} _{i_{m}}^{m}\otimes \cdots \otimes \mathbf {e} _{i_{M}}^{M}\mapsto \mathbf {e} _{\mu (i_{1},i_{2},\ldots ,i_{M})},}
where for every natural number
i
{\displaystyle i}
such that
1
≤
i
≤
I
1
⋯
I
M
{\displaystyle 1\leq i\leq I_{1}\cdots I_{M}}
, the vector
e
i
{\displaystyle \mathbf {e} _{i}}
denotes the ith standard basis vector of
F
i
1
⋯
i
M
{\displaystyle F^{i_{1}\cdots i_{M}}}
. In such a reshaping, the tensor is simply interpreted as a vector in
F
I
1
⋯
I
M
{\displaystyle F^{I_{1}\cdots I_{M}}}
. This is known as vectorization, and is analogous to vectorization of matrices. A standard choice of bijection
μ
{\displaystyle \mu }
is such that
vec
(
A
)
=
[
a
1
,
1
,
…
,
1
a
2
,
1
,
…
,
1
⋯
a
n
1
,
1
,
…
,
1
a
1
,
2
,
1
,
…
,
1
⋯
a
I
1
,
I
2
,
…
,
I
M
]
T
,
{\displaystyle \operatorname {vec} ({\mathcal {A}})={\begin{bmatrix}a_{1,1,\ldots ,1}&a_{2,1,\ldots ,1}&\cdots &a_{n_{1},1,\ldots ,1}&a_{1,2,1,\ldots ,1}&\cdots &a_{I_{1},I_{2},\ldots ,I_{M}}\end{bmatrix}}^{T},}
which is consistent with the way in which the colon operator in Matlab and GNU Octave reshapes a higher-order tensor into a vector. In general, the vectorization of
A
{\displaystyle {\mathcal {A}}}
is the vector
[
a
μ
−
1
(
i
)
]
i
=
1
I
1
⋯
I
M
{\displaystyle [a_{\mu ^{-1}(i)}]_{i=1}^{I_{1}\cdots I_{M}}}
.
The vectorization of
A
{\displaystyle {\mathcal {A}}}
denoted with
v
e
c
(
A
)
{\displaystyle vec({\mathcal {A}})}
or
A
[
:
]
{\displaystyle {\mathcal {A}}_{[:]}}
is an
[
S
1
,
S
2
]
{\displaystyle [S_{1},S_{2}]}
-reshaping where
S
1
=
(
1
,
2
,
…
,
M
)
{\displaystyle S_{1}=(1,2,\ldots ,M)}
and
S
2
=
∅
{\displaystyle S_{2}=\emptyset }
.
Mode-m Flattening / Mode-m Matrixization
[edit]Let
A
∈
F
I
1
⊗
F
I
2
⊗
⋯
⊗
F
I
M
{\displaystyle {\mathcal {A}}\in F^{I_{1}}\otimes F^{I_{2}}\otimes \cdots \otimes F^{I_{M}}}
be the coordinate representation of an abstract tensor with respect to a basis.
Mode-m matrixizing (a.k.a. flattening) of
A
{\displaystyle {\mathcal {A}}}
is an
[
S
1
,
S
2
]
{\displaystyle [S_{1},S_{2}]}
-reshaping in which
S
1
=
(
m
)
{\displaystyle S_{1}=(m)}
and
S
2
=
(
1
,
2
,
…
,
m
−
1
,
m
+
1
,
…
,
M
)
{\displaystyle S_{2}=(1,2,\ldots ,m-1,m+1,\ldots ,M)}
. Usually, a standard matrixizing is denoted by
A
[
m
]
=
A
[
S
1
,
S
2
]
{\displaystyle {\mathbf {A} }_{[m]}={\mathcal {A}}_{[S_{1},S_{2}]}}
This reshaping is sometimes called matrixizing, matricizing, flattening or unfolding in the literature. A standard choice for the bijections
μ
1
,
μ
2
{\displaystyle \mu _{1},\ \mu _{2}}
is the one that is consistent with the reshape function in Matlab and GNU Octave, namely
A
[
m
]
:=
[
a
1
,
1
,
…
,
1
,
1
,
1
,
…
,
1
a
2
,
1
,
…
,
1
,
1
,
1
,
…
,
1
⋯
a
I
1
,
I
2
,
…
,
I
m
−
1
,
1
,
I
m
+
1
,
…
,
I
M
a
1
,
1
,
…
,
1
,
2
,
1
,
…
,
1
a
2
,
1
,
…
,
1
,
2
,
1
,
…
,
1
⋯
a
I
1
,
I
2
,
…
,
I
m
−
1
,
2
,
I
m
+
1
,
…
,
I
M
⋮
⋮
⋮
a
1
,
1
,
…
,
1
,
I
m
,
1
,
…
,
1
a
2
,
1
,
…
,
1
,
I
m
,
1
,
…
,
1
⋯
a
I
1
,
I
2
,
…
,
I
m
−
1
,
I
m
,
I
m
+
1
,
…
,
I
M
]
{\displaystyle {\mathbf {A} }_{[m]}:={\begin{bmatrix}a_{1,1,\ldots ,1,1,1,\ldots ,1}&a_{2,1,\ldots ,1,1,1,\ldots ,1}&\cdots &a_{I_{1},I_{2},\ldots ,I_{m-1},1,I_{m+1},\ldots ,I_{M}}\\a_{1,1,\ldots ,1,2,1,\ldots ,1}&a_{2,1,\ldots ,1,2,1,\ldots ,1}&\cdots &a_{I_{1},I_{2},\ldots ,I_{m-1},2,I_{m+1},\ldots ,I_{M}}\\\vdots &\vdots &&\vdots \\a_{1,1,\ldots ,1,I_{m},1,\ldots ,1}&a_{2,1,\ldots ,1,I_{m},1,\ldots ,1}&\cdots &a_{I_{1},I_{2},\ldots ,I_{m-1},I_{m},I_{m+1},\ldots ,I_{M}}\end{bmatrix}}}
Definition Mode-m Matrixizing:[1]
[
A
[
m
]
]
j
k
=
a
i
1
…
i
m
…
i
M
,
where
j
=
i
m
and
k
=
1
+
∑
n
=
0
n
≠
m
M
(
i
n
−
1
)
∏
l
=
0
l
≠
m
n
−
1
I
l
.
{\displaystyle [{\mathbf {A} }_{[m]}]_{jk}=a_{i_{1}\dots i_{m}\dots i_{M}},\;\;{\text{ where }}j=i_{m}{\text{ and }}k=1+\sum _{n=0 \atop n\neq m}^{M}(i_{n}-1)\prod _{l=0 \atop l\neq m}^{n-1}I_{l}.}
The mode-m matrixizing of a tensor
A
∈
F
I
1
×
.
.
.
I
M
,
{\displaystyle {\mathcal {A}}\in F^{I_{1}\times ...I_{M}},}
is defined as the matrix
A
[
m
]
∈
F
I
m
×
(
I
1
…
I
m
−
1
I
m
+
1
…
I
M
)
{\displaystyle {\mathbf {A} }_{[m]}\in F^{I_{m}\times (I_{1}\dots I_{m-1}I_{m+1}\dots I_{M})}}
. As the parenthetical ordering indicates, the mode-m column vectors are arranged by
sweeping all the other mode indices through their ranges,
with smaller mode indexes varying more rapidly than larger ones; thus
References
[edit]- ^ Vasilescu, M. Alex O. (2009), "Multilinear (Tensor) Algebraic Framework for Computer Graphics, Computer Vision and Machine Learning" (PDF), University of Toronto, p. 21