MLOC Cleaning

Cleaning

An important aspect of the relocation process with mloc, whether calibrated or not, consists of multiple cycles in which the current estimates of empirical reading errors are used to identify outlier readings, which are then flagged so that they will not be used in subsequent relocations. In the following relocation, estimates of empirical reading errors will tend to be smaller because of the filtering of outliers and improvement in the locations of the clustered events. Therefore the process of identifying outliers is iterative and it must be repeated until convergence. In this context, convergence means that the distribution of residuals for a given station-phase is consistent with the current estimate of spread. As outlier readings are flagged, the distribution is expected to evolve toward a normal distribution with standard deviation equal to the empirical reading error. We generally continue this cleaning process until all readings used in the relocation are within 3σ of the mean for that station-phase, where σ is the current estimate of empirical reading error for the relevant station-phase.

Strategy

In this section I will describe in some detail the specific steps I would normally take in analyzing a new cluster.

For the first run, turn off inverse weighting with command weig (“weig off”). Don’t put this in the command file, just issue it interactively, because you may only use it once or twice. You could use the default weighting for this step but turning weighting off completely makes for a more robust inversion when there are likely a lot of outliers. Set the threshold for command lres to 3, again, interactively after reading the command file with command cfil. When using empirical reading errors later in the analysis you would not use a value as small as 3 until the final few runs, but when weighting is turned off every station-phase has an effective reading error of 1 second and 3-second residuals are nearly always safe to flag as outliers.

After the first run of mloc with settings as above, it is safe to simply run the utility program lres and flag everything in the ~.lres file automatically. Don’t run xdat yet, however, because mloc does not yet have the information from the ~.ttsprd file to correct for the offsets of the windows for different phases.

In most cases you can set up your next run of mloc to use the empirical reading errors determined in the first run. This means you will add a line to your command file with the rfil command, referencing the ~.rderr file from the first run. Add lines for commands rhdf and tfil as well, referencing the appropriate output files from the first run. Now you are using empirical reading errors that may be rather small already, but there will still be many gross outliers so the threshold for lres should be large, 5 or 6 is usually adequate.

After this run, inspect the ~.lres file. It will probably be a few tens of kb in size. If there are a few cases with very large values of cluster residual (eci) it is not a problem, you can go ahead and run lres on it. However if there are many such, it may be wise to hand edit with the utility program rstat to clean out the very largest outliers. The reason is that very large outliers can induce readings that are actually good to look like outliers in these early stages: they won’t be as large and will have the opposite sign. Alternatively, you could re-run with a larger threshold for command lres and then run the utility program lres.

Now you are into a repeating pattern of trimming the largest outliers, getting improved estimates of empirical reading errors and re-running until you find few readings with cluster residuals greater than 3. The reduction in threshold value that you will specify for each run with command lres should be guided by how many outliers you are getting at each run. Small bites are better, but there’s no need to waste time, either. It is safest to make at least two runs at each level because it takes two runs before the consequences of flagging outliers to be reflected in the new empirical reading errors. A good approach is to keep running at a given level until there are only a few outliers left at that level; it may take a half-dozen or more runs. From lres=5.0 I would normally drop to lres=4.0, then possibly 3.5 and finally 3.0.

During this process there are other adjustments taking place as well. The two main ones are adjusting the local velocity model, if one is being used, to fit the observe travel time data at local distances and constraining depths. Both activities obviously impact the cleaning process, as well as each other.

Another part of this process is making sure all the data are being used, by checking the ~.stn file for missing stations or conflicts. It is also necessary to occasionally check the “bad data” section of the ~.phase_data file to look for signs of readings that may need to be unflagged.

For a direct calibration analysis it is especially important to keep a close eye on the cleaning process as it applies to the readings being used to locate the hypocentroid. That is the main reason the output file ~.dcal_phase_data exists.

Another way of spotting problems during cleaning is to review the empirical reading errors, either by scanning through the ~.phase_data file or by inspection of the ~.rderr file itself. When the number of samples for a particular station-phase is small, as is often the case, the robust estimator of spread is not very good at distinguishing outliers and will simply produce an overly-large estimate of empirical reading error. It does not take a great deal of seismological expertise to spot these situations, or to know which readings should be flagged. Similarly, scanning the converged residual column of the ~.phase_data file for unusually large residuals will reveal outliers that the automatic process may miss. These kinds of problems are most common among secondary phases, especially regional and teleseismic S-phases, that do not ultimately have much impact on the location analysis, but it is good practice to give them some attention.