In functional modules of interacting proteins · in networks

In studying networks, the identification of network
communities is very important, because through it is possible to find groups of objects that interact
and the relationships that connect them.

There are several examples of communities that are
part of networks:

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

in social networks, groups of friends who attend
the same meeting places, or the same school, or come from the same neighborhood;
or groups of relatives;

in protein networks, communities are functional
modules of interacting proteins

in networks between scholars, groups of
co-authors, or research groups.


By identifying community networks, you can identify
functionally linked objects 2, examine interactions between various modules
4 10, and make predictions about currently unseen connections 9


The problem of identifying community networks can be
understood in this way: we have to cluster sets of node in various communities,
keeping in mind that each node can belong to several communities at the same
time 8. The critical point of the problem lies in the imprecise definition of
what a community is. In Sociology or in Computer science, community structure
is sometimes referred to as “cluster”. But, if a community can be viewed as a
set of links/relationships between nodes, but what are its boundaries? In other
words: when a relationship between two nodes must be considered part of a
community, and when not? A community is generally regarded as a part of a
network within which connections are denser than external ones. But how much


Since the main elements of the problem, i.e. the
concepts of community and partition, are not rigorously defined, and are
therefore understood with a certain degree of arbitrariness and common sense
5, the problem of discovering communities in a network is also ill-posed. Furthermore,
there are many hidden ambiguities and various equally legitimate ways of
dealing with them.

As a result, there are plenty of recipes in the
literature, but the authors do not start with shared definitions. Some criteria
were identified: complete mutuality, reachability, vertex degree and the
comparison of internal versus external cohesion X. Although many definitions have been proposed,
none of them is considered standard.

To develop detection algorithms, we need a quite strict
definition. Here, there are two definition that translate sentences into


The variables to consider are the adjacency matrix $ A
$ and the degree $ k_i $. The network topology is defined in the adjacency
matrix. In the case of an unweighted graph, the cell $ A_ {i, j} $ will have
value 1 if there is an edge between the node i and the node j; otherwise it
will have value 0. Instead,
in the case of a weighted graph, the cell $ A_ {i, j} $ will have a value equal
to the weight of the edge that connects the node i to the node j. The $ k_i $
degree of a $ i $ node is defined in terms of the adjacency matrix as follows: $ k_i = sum
nolimits_ {j} A_ {i, j} $.

If we consider a sub-graph $ V subset G and a node $
i $ such that $ i in V $, the total degree can be seen as the sum of two
different contributions (indergree and outdegree):


k_i(V) = k_i^{in}(V) + k_i^{out}(V)




k_i^{in}(V) = 
olimits_{j in V} A_{i,j}



is the number of edges connecting node $i$ to other
nodes belonging to $V$ and


k_i^{out}(V) = sum
otin V} A_{i,j}


is the number of connections toward nodes in the rest
of the network.


of Community in a Strong Sense}


The subgraph V is a community in a strong sense if:


> k_i^{out}(V), forall i in V



When each node of the subgraph $ V $ has more edges
towards the inside of $ V $ than towards the rest of the graph, $ V $ can be
defined a strong community.



of Community in a Weak Sense.}


subgraph V is a community in a weak sense if:


in V} k_i^{in}(V) > sum
olimits_{i in V} k_i^{out}(V)



When into the subgraph $ V $ the sum of the indegrees
of each node is greater than the sum of the outdegree, $ V $ can be defined a
weak community. It is
easy to understand that a strong community is also a community in a weak sense,
but the opposite is not true.


As mentioned above, these definitions of the community
have the advantage of being very intuitive, but they are not the only ones.

Several other definitions have been proposed, some of which are more suitable
for certain practical contexts.


To give some examples, LS-set is a definition similar
to the strong community one, but even more restrictive. “An LS-set is a set of nodes such that each of
its proper subsets has more ties to its complement within the set than outside”.

cit letterale. Another
definition is the k-core one, which is closer to the definition of the
community in a weak sense. A subgraph is called k-core when each of its nodes
has at least k edges connected to the rest of the subgraph itself.


Attributi e relazioni

In a community, each node is characterized by
attributes, also called properties, and by relationships with other nodes of
the community itself. The methods for identifying communities can then use
these two categories of data. Depending on the method, one or the other
category is privileged, but there are also approaches that try to take into
account both, such as those that will be explained later in this text 8


The first type of data corresponds to the adjacency
matrix, that is the set of relations existing between the various nodes of the
graph. Examples of such relationships can be those referred to above: relations
between friends, interactions between proteins, collaborations between

The second type of data concerns the attributes, i.e.

the characteristics associated with each graph’s node. For example, chemical proprieties
of proteins, personal information about users of social networks, lists of
publications by scholars of the same discipline.

Algorithms that use attributes are called clustering
algorithms. Their goal is to find sets of objects with similar attributes,
without taking into account the relationships between them. Si noti che molte volte è possibile modellare gli attributi
come nodi e i nodi come attributi. In this perspective, a cluster is
conceived as a set of objects that are closer to each other than the other
objects in the dataset. To define the closeness a measure of similarity is
used, which is usually defined over the set of entities.3

In contrast, community detection algorithms use
relationships between nodes, while they usually do not take into account node
attributes. In particular, they try to identify those graph sections in which
the nodes are more densely connected to each other.

Both the clustering approach and the community
detection approach, used independently of each other, have some limits. For
example, if a node has few links, but instead has many attributes in common
with other nodes, considering only the relationships it will not be possible to
tell which community it belongs to. Conversely, it is possible that two nodes
belong to the same community, even if they have no common attribute, but to
detect this it will be necessary to consider data about relationships 8.

Therefore, the best would be to keep both aspects in
mind. Communities should therefore result as a set of nodes that are densely
connected to each other and have some common attributes 8. By adopting this
double perspective, any missing or noise in the data could also be overcome,
compensating for the lack in a data type with the data of the other type. It
should be noted, however, that this combined approach is challenging, because
it requires to use two very different sources of information. 8


Ordine e disordine
nel grafo

representing real systems are objects where order coexists with disorder. They
usually do not have regular structures (lattices), nor are they random graphs. In
a random graph, the probability of having an edge between a pair of nodes
assumes a uniform distribution and therefore the distribution of the edges
between vertices is very homogeneous. Unlike random graphs, real networks are
very inhomogeneous, with a very high level of organization in some areas, and
low or absent organization in others. Moreover, locally the edges often
concentrate within particular groups of nodes, and these groups are not very
connected to each other. 3