What is EUGene, anyway?
EUGene is a program designed primarily for political scientists. It has 2 purposes. First, EUGene generates data for variables used to test Bruce Bueno de Mesquita and colleagues’ version of an expected utility theory of war and dispute initiation (Bueno de Mesquita, 1981, 1985; Bueno de Mesquita and Lalman, 1992). Second, EUGene serves as a data management tool for creating data sets for use in the quantitative analysis of international relations; with the country-year, directed-dyad-year, non-directed-dyad-year, and directed-dispute-dyad-year as the unit of analysis. Until now, these data have been unavailable, and these data management tasks have frequently been cumbersome and difficult. A paper exploring EUGene’s capabilities and rationale is available here.
EUGene is an acronym for Expected Utility Generation and data management program. EUGene is freeware, but is copyrighted. A full download (see the download page) contains the program, expected utility data, documentation, and source code.
- Provides expected utility data both as raw utility scores and as equilibrium predictions from the International Interaction game (Bueno de Mesquita and Lalman 1992).
- Provides updated risk scores for all countries and years from 1816-1984 (updated according to the methods in “The War Trap Revisited” Bueno de Mesquita 1985)
- Creates output data sets with the directed-dyad year, non directed-dyad year, country-year, and directed-dispute dyad units of analysis
- Creates flat text output files containing expected utility (and other) data; variables can be separated by tabs, commas, or spaces (your choice) for easy input into statistics programs
- Automatically creates command files to read data into STATA, SPSS, and LIMDEP
- Builds data sets using your choice of variables from a variety of input data sets:
- input data sets include Polity III data and the Correlates of War capabilities, alliance, contiguity, and system membership data sets
- computes derivative variables including tau-b scores, expected utility information, state-to-state distances, and democratization
- converts input data sets with country-year or dyad-year unit of analysis into a directed dyad-year format
- Allows you to select a population of cases in space (politically relevant, major power, contiguous, or specified selection of dyads) and time (choose all years 1816-1993 or a subrange)
- Can generate a sample of dyads, either a straight sample or stratified according to dispute behavior
Bueno de Mesquita and Lalman’s so-called expected utility theory of war has become one of the most important theories of international conflict. Following the most recent explication, a more appropriate label might be “the War and Reason game-theoretic theory of war,” since War and Reason (Bueno de Mesquita and Lalman 1992) explicitly models game-theoretic interactions between states, and since the “international interaction” game in War and Reason represents only one of many possible games of international conflict that could be constructed.
Whatever the label, the testing of expected utility theory has lagged behind its theoretical development. Even though it is one of the most widely-cited theories of international relations, in its most sophisticated formulation the theory has been tested only on 707 dyad years, all drawn from Europe between 1815 and 1970. Testing has been limited because the necessary data for wider analysis – namely risk attitude scores and utility values for all states and years – have not been available.
EUGene generates expected utility data for all dyads and years. EUGene combines the 1992 methodology of War and Reason with an easy to use program to calculate expected utility values. The program both generates expected utility data and predicts the International Interaction game equilibrium in any given dyad-year. In addition, the program provides users with options for modifying expected utility calculations. EUGene is the first program to implement Bueno de Mesquita and Lalman’s (1992) methodology to generate data for the full population of cases in which we are interested as international relations scholars.
EUGene makes a number of cumbersome tasks associated with building data sets for the quantitative analysis of international relations easier, especially data sets created with the directed dyad-year as the unit of analysis. An example of a directed dyad-year is the US vs. the USSR in 1946. Scholars have increasingly used data sets based on the dyad-year (both directed and non-directed) to conduct quantitative analyses both because dyadic interaction is believed to be at the heart of strategic international behavior, and because it is possible to combine explanations from multiple levels of analysis in one quantitative study when dyadic data is used. Nevertheless, creating dyadic data sets is an onerous task for most researchers. On the independent variable side, creating dyadic data sets involves merging data and renaming variables from multiple monadic data sets. On the dependent variable side, the most common data sets with international conflict events (the Correlates of War Militarized Interstate Dispute data set and Interstate War data set) are not organized in dyadic form and must be converted into dyadic interactions; such conversions are not always straightforward.
EUGene reads, merges, and outputs data from several of the most important other data sets in international relations. In addition to simply merging the data files, EUGene outputs that data in a uniform format that can be imported into any statistical analysis package with ease. Some of the input data sets have the country-year as the unit of analysis (e.g., the Correlates of War national capability data, Gurr Polity data). Other data sets have the dyad as the unit of analysis, such as data about the physical distance between states, or the Correlates of War contiguity data set. Still other data comes in a hybrid form, such as the Correlates of War alliance data set, which has the country-year as its unit of analysis in the data set structure, but which really contains data which is dyadic and annual in its underlying form. Finally, some of the input data sets use multiple data set structures, such as the Correlates of War militarized interstate dispute data set, which comes as three files, one containing country-dispute level records, and two containing dispute-level records. EUGene carries out necessary conversions between the formats, file structures, and differing units of analysis of these data sets.
EUGene also allows users to specify subsets of countries, years, and a variety of variables for output. Data sets are saved in a text format that can be easily read into other programs for statistical analysis. EUGene creates the command files for execution in SPSS, Stata, and LIMDEP to read in the data sets it creates.
The tasks that EUGene carries out can be (and have been) executed in other software programs. However, it is cumbersome and time consuming to repeatedly program large numbers of merge and case selection commands in other programs. For some (namely those not familiar with SPSS or Stata syntax), the process of data merging becomes a manual process of cut and paste. We believe that the set of options provided with EUGene significantly simplifies the task of building data sets containing information from multiple inputs, allowing analysts to spend less time merging data and more time performing analysis.
- Bueno de Mesquita, Bruce, and David Lalman. 1992. War and Reason. New Haven: Yale University Press.
- Bueno de Mesquita, Bruce. 1981. The War Trap. New Haven: Yale University Press.
- Bueno de Mesquita, Bruce. 1985. “The War Trap Revisited.” American Political Science Review 79: 156-177.