The syntax for a merge is: merge type keyvars using dataset. The type must be 1:1 (one-to-one), 1:m (one-to many), m:1 (many-to-one) or m:m (many to many); keyvars is the key variable or variables; and dataset is the name of the data set you want to merge. See full list on

How to deal with more than one dataset in Stata ?

Graduate Statistical Assistant Program,FMRC If the difficulty is that you have too many variables in the datafile, use Stata/SE. Note that Stat/Transfer may be updated (for free) to create datasets in the Stata/SE binary dataset format. If you do not have Stata/SE, or are interested in combining datasets for other purposes, please continue with this document.

When the number of variables in a data set to be analyzed with Stata is larger than 2,047 (very likely with large surveys), the dataset is divided into several segments, each saved as a Stata dataset (.dta file). Mv gunadasa viridu mp3. In order to work with information contained in two or more .dta files it is necessary to merge the segments into a new single file which must not contain more than 2,047 variables. Here is a list of steps to construct a new database with information merged from different files. Recall that any manipulation of the data made with a Stata do-file allows you to review and/or repeat the procedure more easily, an example of how to make a do-file is given below.

1. Review the codebook or list of variables and determine what information is needed and which files contain the desired variables.

2. Read into Stata the first file, or segment:

Note that an unique ID for each case (observation) must be provided in each file to be merged. Typically the ID for a time series database is the date of the observation. For a cross section, it is the ID of the cross section unit (family identifier, firm CUSIP, etc.) , and in panel data two characteristics are needed to identify each observation: date and ID. However for panel data, sometimes a 'case ID' is provided to facilitate merging.

It is important to ensure that the form in which the unique ID is held in each file must match: i.e. you can not match a 'str8' (8-character string) to an 'str6' ID, nor can you match a string to an integer. Use Stata's 'describe' command to ensure that the name and data type of the ID variable are the same in all files.

3. Discard the variables that are NOT needed (keeping the case ID); this can be done in at least two ways. Wildcards (*) and hyphens (-) may be used in the varlists; see 'help varlist' for their use.

if the useful variables can be listed more easily:

if the unwanted variables can be listed more easily:

Remember that the case ID must be part of the new file.

4. Verify that only the desired variables are in memory:

5. Sort the data by case ID:

6. Save the sorted data currently in memory with a different name:

7. Repeat steps 2 to 5 for all files containing the desired variables. Finally you will end up with a set of new files (newfile1.dta, newfile2.dta, .. newfileJ.dta) to be merged into a new dataset. Now you are ready to merge the data.


The merge command merges corresponding observations from the dataset currently in memory (called the master dataset) with those from a different Stata-format dataset (called the using dataset) into single observations. A new variable _merge is created for informative purposes (described below). Both files must be previously sorted by the merge variable(s), e.g. case ID.

8. Merge the first two new files.

a) Read the master dataset (newfile1.dtarecently created):

b) Merge the data with the using dataset (newfile2.dta):

c) Tabulate _merge:

Stata Help Merge

The variable _merge is created automatically and it takes the following values:

You can use the tabulated information to check if the data were merged as desired.

d) Drop the _merge variable:

e) If there are more than two files to be merged, use the current data in memory as the master dataset and repeat steps 8b-8d for each file to be merged (newfile3.dta, newfile4.dta, .. newfileJ.dta).

9. Save the new dataset:

Sample program

Merge Files Stata Software

Here is an example of how a do-file can be used to merge data contained in three hypothetical segments.

  • Variables to merge: X11, X12, X13, X21, X22, X23, X31, X32 and X33
  • Segments containing these variables: segment1.dta, segment2.dta and segment3.dta
  • Identifier: ID (the variable ID, contained in each of the three segments)

Merge Pdf Files

This do file merges some variables from the .dta files: segment1.dta, segment2.dta and segment3.dta into a new file named newdatabase.dta. This do-file will be documented in the log-file logmerge.smcl for further reference.

Thanks to Petia Petrova for contributions to this document.Last updated: 05 March 2002 by Kolver Hernandez / cfb