In studying networks, the identification of network

communities is very important, because through it is possible to find groups of objects that interact

and the relationships that connect them.

There are several examples of communities that are

part of networks:

·

in social networks, groups of friends who attend

the same meeting places, or the same school, or come from the same neighborhood;

or groups of relatives;

·

in protein networks, communities are functional

modules of interacting proteins

·

in networks between scholars, groups of

co-authors, or research groups.

By identifying community networks, you can identify

functionally linked objects 2, examine interactions between various modules

4 10, and make predictions about currently unseen connections 9

The problem of identifying community networks can be

understood in this way: we have to cluster sets of node in various communities,

keeping in mind that each node can belong to several communities at the same

time 8. The critical point of the problem lies in the imprecise definition of

what a community is. In Sociology or in Computer science, community structure

is sometimes referred to as “cluster”. But, if a community can be viewed as a

set of links/relationships between nodes, but what are its boundaries? In other

words: when a relationship between two nodes must be considered part of a

community, and when not? A community is generally regarded as a part of a

network within which connections are denser than external ones. But how much

denser?

Since the main elements of the problem, i.e. the

concepts of community and partition, are not rigorously defined, and are

therefore understood with a certain degree of arbitrariness and common sense

5, the problem of discovering communities in a network is also ill-posed. Furthermore,

there are many hidden ambiguities and various equally legitimate ways of

dealing with them.

As a result, there are plenty of recipes in the

literature, but the authors do not start with shared definitions. Some criteria

were identified: complete mutuality, reachability, vertex degree and the

comparison of internal versus external cohesion X. Although many definitions have been proposed,

none of them is considered standard.

To develop detection algorithms, we need a quite strict

definition. Here, there are two definition that translate sentences into

formulas.

The variables to consider are the adjacency matrix $ A

$ and the degree $ k_i $. The network topology is defined in the adjacency

matrix. In the case of an unweighted graph, the cell $ A_ {i, j} $ will have

value 1 if there is an edge between the node i and the node j; otherwise it

will have value 0. Instead,

in the case of a weighted graph, the cell $ A_ {i, j} $ will have a value equal

to the weight of the edge that connects the node i to the node j. The $ k_i $

degree of a $ i $ node is defined in terms of the adjacency matrix as follows: $ k_i = sum

nolimits_ {j} A_ {i, j} $.

If we consider a sub-graph $ V subset G and a node $

i $ such that $ i in V $, the total degree can be seen as the sum of two

different contributions (indergree and outdegree):

egin{equation}

k_i(V) = k_i^{in}(V) + k_i^{out}(V)

end{equation}

where

egin{equation}

k_i^{in}(V) =

sum

olimits_{j in V} A_{i,j}

end{equation}

is the number of edges connecting node $i$ to other

nodes belonging to $V$ and

egin{equation}

k_i^{out}(V) = sum

olimits_{j

otin V} A_{i,j}

end{equation}

is the number of connections toward nodes in the rest

of the network.

subsubsection*{Definition

of Community in a Strong Sense}

The subgraph V is a community in a strong sense if:

egin{equation}

k_i^{in}(V)

> k_i^{out}(V), forall i in V

end{equation}

When each node of the subgraph $ V $ has more edges

towards the inside of $ V $ than towards the rest of the graph, $ V $ can be

defined a strong community.

subsubsection*{Definition

of Community in a Weak Sense.}

The

subgraph V is a community in a weak sense if:

egin{equation}

sum

olimits_{i

in V} k_i^{in}(V) > sum

olimits_{i in V} k_i^{out}(V)

end{equation}

When into the subgraph $ V $ the sum of the indegrees

of each node is greater than the sum of the outdegree, $ V $ can be defined a

weak community. It is

easy to understand that a strong community is also a community in a weak sense,

but the opposite is not true.

As mentioned above, these definitions of the community

have the advantage of being very intuitive, but they are not the only ones.

Several other definitions have been proposed, some of which are more suitable

for certain practical contexts.

To give some examples, LS-set is a definition similar

to the strong community one, but even more restrictive. “An LS-set is a set of nodes such that each of

its proper subsets has more ties to its complement within the set than outside”.

cit letterale. Another

definition is the k-core one, which is closer to the definition of the

community in a weak sense. A subgraph is called k-core when each of its nodes

has at least k edges connected to the rest of the subgraph itself.

Attributi e relazioni

In a community, each node is characterized by

attributes, also called properties, and by relationships with other nodes of

the community itself. The methods for identifying communities can then use

these two categories of data. Depending on the method, one or the other

category is privileged, but there are also approaches that try to take into

account both, such as those that will be explained later in this text 8

The first type of data corresponds to the adjacency

matrix, that is the set of relations existing between the various nodes of the

graph. Examples of such relationships can be those referred to above: relations

between friends, interactions between proteins, collaborations between

scholars.

The second type of data concerns the attributes, i.e.

the characteristics associated with each graph’s node. For example, chemical proprieties

of proteins, personal information about users of social networks, lists of

publications by scholars of the same discipline.

Algorithms that use attributes are called clustering

algorithms. Their goal is to find sets of objects with similar attributes,

without taking into account the relationships between them. Si noti che molte volte è possibile modellare gli attributi

come nodi e i nodi come attributi. In this perspective, a cluster is

conceived as a set of objects that are closer to each other than the other

objects in the dataset. To define the closeness a measure of similarity is

used, which is usually defined over the set of entities.3

In contrast, community detection algorithms use

relationships between nodes, while they usually do not take into account node

attributes. In particular, they try to identify those graph sections in which

the nodes are more densely connected to each other.

Both the clustering approach and the community

detection approach, used independently of each other, have some limits. For

example, if a node has few links, but instead has many attributes in common

with other nodes, considering only the relationships it will not be possible to

tell which community it belongs to. Conversely, it is possible that two nodes

belong to the same community, even if they have no common attribute, but to

detect this it will be necessary to consider data about relationships 8.

Therefore, the best would be to keep both aspects in

mind. Communities should therefore result as a set of nodes that are densely

connected to each other and have some common attributes 8. By adopting this

double perspective, any missing or noise in the data could also be overcome,

compensating for the lack in a data type with the data of the other type. It

should be noted, however, that this combined approach is challenging, because

it requires to use two very different sources of information. 8

Ordine e disordine

nel grafo

Graphs

representing real systems are objects where order coexists with disorder. They

usually do not have regular structures (lattices), nor are they random graphs. In

a random graph, the probability of having an edge between a pair of nodes

assumes a uniform distribution and therefore the distribution of the edges

between vertices is very homogeneous. Unlike random graphs, real networks are

very inhomogeneous, with a very high level of organization in some areas, and

low or absent organization in others. Moreover, locally the edges often

concentrate within particular groups of nodes, and these groups are not very

connected to each other. 3