center818008745855Yasin BayrakEmail address941009200Yasin BayrakEmail addresscenter700007484110Mehmet Ali Sezgin

Ph.D. (c) in Mathematics Statistics

University of Ljubljana

9410010000Mehmet Ali Sezgin

Ph.D. (c) in Mathematics Statistics

University of Ljubljana

center300003207385sTOCHASTIC BLOCK MODEL

9410036300sTOCHASTIC BLOCK MODEL

Introduction

First of all we need explain the term “stochastic”; it’s variable andd it means randomly. Stochastic processes have an important place in statistics and engineering. As an example of a stochastic process, one of the basic applications can be given: Companies look at past demand quantities and try to predict how much demand for their products in the coming months. Instead of guessing this outright, they examine closely how they behave during the past days or months. Because Demand is stochastic so it’s a variable amount and they benefit from mathematics and especially statistics when examining requests (demand).

The stochastic block model (SBM) is a random and generative graph model with planted clusters. It’s usually used for clustering and community detection. Community detection is very important issue in network and graph analysis. In many system, system building elements fail randomly and Many approaches have been proposed to solve this challenging problem in data science.

Stochastic block model firsly proposed for social networks and numerical examples is shown for the illustration of the method. Social networks basically use graph and networks theory to investigate social structures. Social network analysis is a key technique in modern sociology. The determination of the complex structures in the sociology is only possible with the right statistical and funtional buildings. Social scientists have used the concept of “social networks” since early in the 20th century to connote complex sets of relationships between members of social systems at all scales, from interpersonal to international. Georg Simmel and Émile Durkheim are the two characters who who wrote about the importance of studying patterns of relationships that connect social actors in the beginnig of the social network history.

Many modern computer languages and package programmes like R programming, Matlab, Gephi etc. have used at Modelling and visualization of social networks. You can see an example photo below from social network anlysis from Itcilo.

These kind of graphes, especially in 3d, helps a lot in the analysis of social structures when isualize relationships within and outside or inside the organisation or identify knowledge bottlenecks, isolated individuals and groups.

Networks have become one of the more common forms of data, and network analysis has received a lot of attention in computer science, physics, social sciences, biology and statistics. The applications are many and varied, including social networks, gene regulatory networks, recommender systems and security monitoring.

Stochastic block model network can represent both positive and negative (good and bad) relationships in graphs. Also especially when using social network analysis as a tool for facilitating change, different approaches of participatory network mapping have proven useful.

Stochastic block model network can be used in many diciplines and areas related with social network. With developing new technologies and softwares, block models can be entegrated many new fields. Social network analysis with block model network is also used in intelligence, counter-intelligence and law enforcement activities. This technique allows the analysts to map a clandestine or covert organization such as a espionage ring, an organized crime family or a street gang. Since most crimes are random, the use of block models is extremely convenient.

2. History of Stochastic Block Model

Stochastic Block Model (SBM) method is exist by help of Social Network Analysis (SNA) which its early steps were taken by Auguste Marie François Xavier Comte (Auguste Comte) (worked with Henri Saint-Simon start from 1817).

Comte determined society with the terms which are known as social network analysis terminology. Most other prominent nineteenth and early twentieth century sociologists embraced Comte’s structural perspective. A common theme involved describing differences in the patterning of social connections in traditional versus modern societies.

Ferdinand Tönnies (1855/1936) made a similar distinction when he used the word gemeinschaft to characterize the traditional social form that involved personal and direct social ties that linked individuals who shared values and beliefs.

Emile Durkheim (1893/1964) described traditional societies in which solidarité mechanique linked similar individuals with repressive regulations.

Sir Herbert Spencer (1897) in England and Charles Horton Cooley (1909/1962) in America both described traditional smallscale societies in which individuals were linked by intimate, primary relations. And they both contrasted those with modern, large-scale societies where individuals are often linked by impersonal, secondary relations.

Swiss naturalist Pierre Huber collected systematic data on social format in the early nineteenth century, before structural approach was determined by Comte. He published a large scale study based on systematic observatio on honeybees, even after he became blind, with his wife’s and his servant’s helps, in 1792.

Recent research in network analysis still includes observation-based studies of the patterning of social linkages among nonhumans. And network analysts still conduct systematic studies of dominance. Huber’s work, then, provided a model for later research in both biology and social network analysis.

In 1926, Beth Wellman collected network data by recording systematic observations of who played with whom among preschool children during periods of free play. She pioneered, then, in extending Huber’s ethological approach to the study of human interaction. In network analysis involving human subjects, data generated by questioning actors are common, but observational data of social links are still gathered.

In 1928, Helen Bott went even further, refining Beth Wellman’s approach in several ways. First, she used ethnographic methods to uncover the various forms of interaction that occurred regularly among preschool children. These methods enabled her to limit systematic observations to the specific kinds of interaction relevant in the context of her research. Second, she was the first to employ a focal-child method for collecting detailed observations of who displayed each particular form of interaction with whom.

In 1933, Elizabeth Hagman brought these two approaches to data collection – observation and interview – together. She explicitly raised a data-related issue that is still a central concern among contemporary network analysts. She observed which children played with which others during a period of free play. Then, at the end of the school term, she interviewed each child. They were asked to name who their playmates had been at the beginning of the term, in the middle of the term and at the end of the term. She then compared the observed data with the reports and examined the discrepancy between the two. that the discrepancy she found defined a research problem that remains a key issue to a great many recent investigators in the social network field (Bernard, Killworth, Kronenfeld and Sailer, 1985; Freeman, Romney and Freeman, 1987).

The very first examples of graphical shown of the social networks among observations were focused on kinship. Those images shows general pattern pictorially the proximity between the occupants of any two kin categories.

Near the end of the nineteenth century, Alexander Macfarlane developed a formal model of the British kinship system.

Unlike many other approaches to social research, network analysis has consistently drawn on various branches of mathematics both to clarify its concepts and to spell out their consequences in precise terms.

Watson began with the notion of a population of family names. He proposed a set of parameters involving the probabilities of a given man producing 0, 1, 2,…, q male offspring. From these properties, Watson calculated the expected proportions of each surname in each succeeding generation. Of course, any surname holder who produced no male offspring would contribute to the reduction of representatives of his name in succeeding generations. So Watson was able to show that, simply by a random process of reproduction, “We have a continual extinction of surnames going on.” His conclusion was that any family name would ultimately disappear with a probability of 1.

Some, like Morgan (1871), Macfarlane (1883) and Hobson (1894), produced work that embodied two of the features. Morgan collected huge amounts of systematic data on kinship and displayed his results in graphic images. Macfarlane developed an algebraic model of kinship and he too used graphic images to display its properties. Hobson collected systematic data on corporate interlocks, then drew hypergraphs to reveal their observed interlock patterns. By employing two of the four tools that define modern social network analysis, these nineteenth century investigators began to approach current practice.

In any case, in the early 1930s a broad research effort called sociometry was introduced. It was the first work that included all four of the defining features of social network analysis.

By 1938, then, the work of Moreno—with the help of Jennings and Lazarsfeld—had displayed all four of the features that define contemporary social network analysis. It is clear, moreover, that they recognized the generality of their approach. They collected data on positive and negative emotional choices and on who was acquainted with whom. They observed interaction patterns linking individuals. They discussed kinship ties. And they examined social roles.

A research effort that focused on the study of social structure began at Harvard in the late 1920s. Centered in the Graduate School of Business Administration on one side of the Charles River and the Society of Fellows on the other, it involved a relatively large number of faculty, including William Lloyd Warner, George Elton Mayo, Fritz Roethlisberger, T. North Whitehead and Lawrence J. Henderson. A number of students from a variety of disciplines were also involved, including Eliot Chapple, Conrad Arensberg, Allison Davis, Elizabeth Davis, Burleigh Gardner, George Caspar Homans and William Foote Whyte.

By the 1940s, much of the excitement that Moreno and Jennings had generated in the 1930s had already started to wane. Moreover, the Harvard group had broken up and its members had drifted away from structural analysis. So the period in question was essentially a kind of “dark ages” for social network analysis. There was no generally recognized approach to social research that embodied the structural paradigm. Social network analysis was still not identifiable either as a theoretical perspective or as an approach to data collection and analysis.

Loomis trained a large number of graduate students in the use of sociometric research tools (Driscoll et al., 1993). With their help, he conducted a series of comparative studies of small villages and rural areas throughout the world. (Examples are Loomis, 1946; Holland and Loomis, 1948; Loomis and Powell, 1949; Loomis and Proctor, 1950.) At the same time, Loomis recognized the importance of mathematics in structural research and sought out colleagues in mathematics to help him deal with the complexities of network analysis. Prominent among those was the mathematical statistician Leo Katz.

Over the next few years, then, Katz produced a series of papers that made major contributions to sociometry and, in the long run, to social network analysis (Forsyth and Katz, 1946; Katz, 1947; Katz and Powell, 1955; Bhargava and Katz, 1963). In addition Katz led several students in mathematics to work on applied problems in structural analysis. Among these, Charles Proctor and T. N. Bhargava made important contributions.

Lévi-Strauss provided both the intuitive background and the data for Weil’s algebraic modeling. Weil, in turn, developed a model for one of the most complicated kinship systems, the Australian Murngin, who had been studied by W. Lloyd Warner. Lévi-Strauss described his basis for asking Weil to build a model (1960).

Both Lévi-Strauss and Weil used graphic images in their treatment of kinship. Jointly they produced a work that included all of the properties of social network analysis. Although they did draw on the earlier work of Radcliff-Brown.

Early in the 1950s an effort was started at Lund University in Sweden. It was led by a Swedish geographer, Torsten Hägerstrand. Hägerstrand and his future wife decided to trace the lives of every person who had lived in the Swedish rural area of Asby from 1840 to 1940. All in all, they recorded the lives and migration patterns of more than 10,000 people. And set out to do theory-based structural work in geography.

Hägerstrand’s work was structural. It displayed all the features of social network analysis. He provided a model for a geography that explored theoretical issues and attempted to explain the distributions of objects in physical space. That approach had a tremendous impact on the field of geography and led a whole generation of geographers to do similar kinds of structural work.

In the late 1950s MIT experienced a rebirth as a place for network thinking and research—this time centered in the political science department.

By the end of the 1950s enough structural research had been published so that any new network studies simply had to be derivative. Nevertheless, several research groups emerged in the 1960s that succeeded in broadening the perspective and introducing it to new audiences.

In 1961, Flament was invited to participate in a summer program in Mathematics and the Social Sciences organized by Paul Lazarsfeld. At that time, James S. Coleman was editing a series of books on Mathematical Analysis of Social Behavior published by Prentice Hall. Coleman asked Flament to do a book for the series, and in 1963, Flament’s book, Applications of Graph Theory to Group Structure, was published. That book presented an integrated approach to both communication research and structural balance and presented applications on communication in work groups, on political blocs and on kinship structures.

At the end of the 1960s, no version of network analysis was yet universally recognized as providing a general paradigm for social research. By then, however, the broad community of people engaged in social research were ready to embrace a structural paradigm. It was in this setting that Harrison White and his students began their structural work.

Computers have played a key role in the development of social network analysis. In fact, Alvin W. Wolfe (1978) argued that the field could not have developed without them. The special procedures entailed in the analysis of relation-based data have from the beginning required the development of programs tailored to social network applications.

The earliest of these special purpose programs were all relatively simple and task-specific (Freeman, 1988). In the late 1950s James S. Coleman and Duncan MacRae (1960) produced a computer program that was designed to find groups of closely linked individuals in a social network data set. Later, Coleman’s graduate student Seymour Spilerman produced a new algorithm that refined and extended their approach (1966). Soon thereafter Samuel Leinhardt (1971) attacked a very different kind of problem. He wrote a program, SOCPAC I, to tabulate the various kinds of pairs and triples that can be found in social network data. That same year Gregory Heil and Harrison White introduced BLOCKER, a program designed to uncover actors who occupied similar positions in the overall structure. And in 1972 Richard D. Alba and Myron P. Gutmann (1972) returned to the original problem of uncovering groups. They wrote a program, SOCK, and Alba added another, COMPLT, that took another approach to specifying groups. A year later H. Russell Bernard and Peter D. Killworth (1973) produced still another group-finding program, CATIJ, based on yet another algorithm. The program CONCOR was introduced in 1975 by Ronald L. Breiger, Scott A. Boorman and Phipps Arabie (1975). It was designed to find not groups, but collections of individuals occupying similar structural positions in a network. The next year, Gregory H. Heil and Harrison C. White (1976) produced a new version of the BLOCKER program that provided another way to find equivalent structural positions. And that same year Ronald S. Burt produced a program called STRUCTURE that took a third approach to the same task. William D. Richards (1975) developed NEGOPY—another group finder—based on still another algorithm. Stephen B.

Seidman and Brian L. Foster (1978) released SONET, a collection of graph theoretic tools for dealing with kinship relations. In 1979 I wrote a program, CENTER, that used several algorithms to determine the extent to which individuals occupied central positions in their social networks. And in 1981 a group in the Netherlands, including Robert Mokken, Frans Stokman and Jac M. Anthonisse, developed another graph theoretic program, GRADAP. This program, like my program, CENTER, focused particularly on uncovering central positions. And finally, that same year, Peter Carrington and Gregory H. Heil wrote COBLOC, yet another program designed to uncover equivalent structural positions. These early programs varied widely. They were concerned with groups, positions, centrality, kinship structure and distributions of structural properties. This kind of variation suggests again that the early social network community was, as I have argued, diverse. But at the beginning of the 1980s various attempts were made to tie all of these separate approaches together by producing a general-purpose network analysis program. Gregory H. Heil, working at the University of Toronto, made an effort to produce an integrated set of network analysis tools. Douglas R. White and Lee D. Sailer at the University of California at Irvine made a similar attempt, as did J. Clyde Mitchell, Clive Payne and David Deans at Oxford University. Unfortunately, none of these efforts panned out; none produced a program for general use. But in 1983 Franz Urban Pappi and Peter Kappelhoff of Christian-Albrechts-Universität in Kiel produced a general-purpose network analysis program called SONIS (Pappi and Stelck, 1987). And that same year, working with a team at the University of California at Irvine, I produced the first version of a program called UCINET. Since then, UCINET has been refined and extended, first in Version 3.0 through the efforts of a post-doctoral student, Bruce MacEvoy, and more recently in later versions by Stephen P. Borgatti and Martin G. Everett. (My involvement in these later versions has been peripheral.) In any case, both SONIS and UCINET were explicitly designed to include all the procedures that network analysts—regardless of their background—might want to use.

These four programs, then—STRUCTURE, GRADAP and particularly SONIS and UCINET—all made an attempt to include the full range of network analytic procedures. All four have gone through several revisions and all are still being distributed.

Then, in need of clustered structure of social network analysis, stochastic block models were developed.

3. Usage Areas and Developments

Network science is potentially useful for many certain problems in data analysis and researchers in many fields and businesses in several industries have exploited the recent advances in information technology to produce an explosion of data on complex systems. The stochastic block model is a social network model with defined communities and each node need to be a member of the community.

For a minimal network example in R Programming;

Researchers employ social network analysis in the study of computer-supported collaborative learning in part due to the unique capabilities it offers.

Several of the complex systems have interacting units or actors that networks or graphs can easily represent, providing a range of disciplines with a suite of potential questions on how to produce knowledge from network data.

In recently years, both theoretical and computational studies on stochastic block model network analysis have been rapidly growing in many areas, including bioinformatics, academic collaboration, economy, biology, artificial intelligance and social media. Statistical models have a relatively long history under this category in discovering hidden structural knowledge of networks. When we need random variables and probabilities in networks, we need to use stochastic block model to make maintain model to test for our research. Also many approachs like maximum likelihood or clustering method can help about stochastic block model network in the research.

Stochastic block models also can be used for simulation according to our research and we can create many possible results based on our main model. These result can be visualized with many statistical and mathematical softwares. Here an below example outputs from many simulations;

On the top graphs (simulations); you can see many possible and random (stochastic) outputs for the block model network. We can interpret many result from different simulations for the related research. For example in the the result of SC in Figure 3 clearly indicates that the clustering task is moderately difficult. Interprets can be increased.

Stochastic block network models usually used in at least one of the parameter would known. But recent researches show us that there is no necessary to know parameters of the model. SBM (Stochastic Block Model) also very useful for machine learning algorithms. Machine learning issues usally require big data sets and to fix these data sets which also includes many variables, we need to use complex and randomly picked parameters from the network based on models.

Likelihood-based on model selection could be used for Stochastic Block Models. Here an example density graphs for this approach;

Also below you can find two result graphical output for SBM;

In this reference paper, Researchers have studied the problem of selecting the community number under both regular SBM and DCSBM, allowing the average degree to grow at a polylog rate and the true block number being fixed. Researches have shown the log likelihood ratio statistic has an asymptotic normal distribution when a smaller model with fewer blocks is specified.

Modeling relations between individuals is a classical question in social sciences and clustering individuals according to the observed patterns of interactions allows to uncover a latent structure in the data. Stochastic block model (SBM) is a popular approach for grouping the individuals with respect to their social comportment. When several relationships of various types can occur jointly between the individuals, the data are represented by multiplex networks where more than one edge can exist between the nodes. When the goal is to cluster individuals according to their social comportment, we can derive a Stochastic Block Model version of the multiplex Erd¨os-R´enyi model.

Dynamic Stochastic Model is also the other version o the SBM’s. It’s a revise of the notations of Stochastic Block Models. This anysis use for dynamic graphs and visualizations. In this method several real networks such as communication networks, financial transaction networks, mobile telephone networks and social networks (Facebook, Linkedin, etc.) can be modelled via graphs.

Many scientists and researchers in many science field need to use social network analysis and block model networks in their researches. All these activities will develop this area in future. Also with fast developments in computer science and software engineering will help to development of social networks. Bootstrapping simulations with an efficient algorithm, and correcting classic model selection theories can be used for block models based on the simulation data.

To understand developments of the Stochastic Model Networks, first of all we need to understand generalized graphs, randomly picked variables and the probability theories in statistics that helps us to analyze our models and blocks. Also Stochastic Block Models also can use with Nonparametric researches. These researches usually include the prior knowledge of the number of groups or other dimensions of the model, which are instead inferred from data. Forn an example application; giving a comprehensive treatment of different kinds of edge weights (i.e., continuous or discrete, signed or unsigned, bounded or unbounded), as well as arbitrary weight transformations, and describe an unsupervised model selection approach to choose the best network description.

Here a Stochastic Block Model Network example in R programming with “Bank Wiring” data set. You can see data set below with the many variables which includes blocks (dimensions).

Each variable has own matrix to calculate correlation. you can use correlation to measure the similarity of similarities. If we repeat this procedure over and over, we eventually end up with a matrix whose entries take on one of two values: 1 or -1. The final matrix can then be permuted to produce blocks of 1s and -1s, with each block representing a group of structurally equivalent actors.

Stochastic Block Model outputs;

Relation Graphs for all variables;

4. Bibliography

Stochastic blockmodels: First steps, Paul W.Holland HYPERLINK “https://www.sciencedirect.com/science/article/pii/0378873383900217?via%3Dihub” l “!” Kathryn Blackmond Laskey Samuel LeinhardtCONCOR in R, Adam, r-bloggersNonparametric weighted stochastic block models, Tiago P. Peixoto

Phys. Rev. E 97, 012306 – Published 16 January 2018

LIKELIHOOD-BASED MODEL SELECTION FOR STOCHASTIC BLOCK MODELS By Y.X. Rachel Wang? and Peter J. Bickel† Department of Statistics, Stanford University? Department of Statistics, University of California, Berkeley†, March 2016

Recovering Communities in the General Stochastic Block Model Without Knowing the Parameters, Emmanuel Abbe,Colin Sandon

Improving Stochastic Block Models by Incorporating Power-Law Degree Characteristic, Maoying Qiao , Jun Yu , Wei Bian , Qiang Li and Dacheng Tao

A minimal network example in R, Sieste, r-bloggers, 2012

Spectral clustering and the high-dimensional stochastic blockmodel, Karl Rohe, Sourav Chatterjee, and Bin YuSocial Network Analysis, Tom WambakeSocial Network Analysis, wiki-zero

Stochastic Block Model, wiki-zero

THE DEVELOPMENT OF SOCIAL NETWORK ANALYSIS, A STUDY IN THE SOCIOLOGY OF SCIENCE, Linton C. Freeman

Building Stochastic Block Models, Carolyn J. Anderson and Stanley Wasserman, Katherine Faust

Network Analysis and Modeling, CSCI 5352, Prof. Aaron Clauset, 5 November 2013

Introduction to Stochastic Actor-Based Models for Network Dynamics, Tom A.B. Snijders, Gerhard G. van de Bunt, Christian E.G. Steglich

Manual for RSiena Ruth M. Ripley, Tom A.B. Snijders Zs´ofia Boda, Andr´as V¨or¨os, Paulina Preciado

Community Detection and Stochastic Block Models: Recent Developments, Emmanuel Abbe†

Historical Social Network Analysis, CHARLES WETHERELL

R Codes

devtools::install_github(“aslez/concoR”)

install.packages(“devtools”)

library(devtools)

devtools::install_github(“aslez/concoR”)

library(concoR)

library(sna)

install.packages(“sna”)

library(sna)

data(bank_wiring)

bank_wiring

bank_wiring

m0 ;- cor(do.call(rbind, bank_wiring))

round(m0, 2)

blks ;- concor_hca(bank_wiring, p = 2)

blks

blk_mod ;- blockmodel(bank_wiring, blks$block,

glabels = names(bank_wiring),

plabels = rownames(bank_wiring1))

blk_mod

plot(blk_mod)

history()