AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks (open access)

AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks

Today's largest systems have over 100,000 cores, with million-core systems expected over the next few years. This growing scale makes debugging the applications that run on them a daunting challenge. Few debugging tools perform well at this scale and most provide an overload of information about the entire job. Developers need tools that quickly direct them to the root cause of the problem. This paper presents AutomaDeD, a tool that identifies which tasks of a large-scale application first manifest a bug at a specific code region at a specific point during program execution. AutomaDeD creates a statistical model of the application's control-flow and timing behavior that organizes tasks into groups and identifies deviations from normal execution, thus significantly reducing debugging effort. In addition to a case study in which AutomaDeD locates a bug that occurred during development of MVAPICH, we evaluate AutomaDeD on a range of bugs injected into the NAS parallel benchmarks. Our results demonstrate that detects the time period when a bug first manifested itself with 90% accuracy for stalls and hangs and 70% accuracy for interference faults. It identifies the subset of processes first affected by the fault with 80% accuracy and 70% accuracy, respectively and the …
Date: March 23, 2010
Creator: Bronevetsky, G; Laguna, I; Bagchi, S; de Supinski, B R; Ahn, D & Schulz, M
Object Type: Article
System: The UNT Digital Library
Statistical Fault Detection for Parallel Applications with AutomaDeD (open access)

Statistical Fault Detection for Parallel Applications with AutomaDeD

Today's largest systems have over 100,000 cores, with million-core systems expected over the next few years. The large component count means that these systems fail frequently and often in very complex ways, making them difficult to use and maintain. While prior work on fault detection and diagnosis has focused on faults that significantly reduce system functionality, the wide variety of failure modes in modern systems makes them likely to fail in complex ways that impair system performance but are difficult to detect and diagnose. This paper presents AutomaDeD, a statistical tool that models the timing behavior of each application task and tracks its behavior to identify any abnormalities. If any are observed, AutomaDeD can immediately detect them and report to the system administrator the task where the problem began. This identification of the fault's initial manifestation can provide administrators with valuable insight into the fault's root causes, making it significantly easier and cheaper for them to understand and repair it. Our experimental evaluation shows that AutomaDeD detects a wide range of faults immediately after they occur 80% of the time, with a low false-positive rate. Further, it identifies weaknesses of the current approach that motivate future research.
Date: March 23, 2010
Creator: Bronevetsky, G; Laguna, I; Bagchi, S; de Supinski, B R; Ahn, D & Schulz, M
Object Type: Article
System: The UNT Digital Library
Laboratory Directed Research and Development Program FY 2009 for Lawrence Berkeley National Laboratory (open access)

Laboratory Directed Research and Development Program FY 2009 for Lawrence Berkeley National Laboratory

Berkeley Lab LDRD FY2009 Annual Report
Date: March 23, 2010
Creator: Hansen, Todd C.
Object Type: Report
System: The UNT Digital Library
MMCR Calibration Report (open access)

MMCR Calibration Report

Calibration report for the Millimeter Wavelength Cloud Radar performed for the ARM Climate Research Facility by ProSensing Inc.
Date: March 23, 2010
Creator: Mead, D.
Object Type: Report
System: The UNT Digital Library
An estimate of collisional beam scattering during final focus in NDCX-II (open access)

An estimate of collisional beam scattering during final focus in NDCX-II

The final focus of NDCX-II contains a region with quite high plasma density. We estimate here how much collisional scatter we expect from transit through this plasma. A separate question, not explored here, is how much scatter there might be off of collective fluctuations in the neutralizing plasma, including those driven by the passage of the beam.
Date: March 23, 2010
Creator: Cohen, R.H.
Object Type: Report
System: The UNT Digital Library
Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies (open access)

Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the …
Date: March 23, 2010
Creator: Catfish Genome Consortium
Object Type: Article
System: The UNT Digital Library
Hanford Sludge Simulant Selection for Soil Mechanics Property Measurement (open access)

Hanford Sludge Simulant Selection for Soil Mechanics Property Measurement

The current System Plan for the Hanford Tank Farms uses relaxed buoyant displacement gas release event (BDGRE) controls for deep sludge (i.e., high level waste [HLW]) tanks, which allows the tank farms to use more storage space, i.e., increase the sediment depth, in some of the double-shell tanks (DSTs). The relaxed BDGRE controls are based on preliminary analysis of a gas release model from van Kessel and van Kesteren. Application of the van Kessel and van Kesteren model requires parametric information for the sediment, including the lateral earth pressure at rest and shear modulus. No lateral earth pressure at rest and shear modulus in situ measurements for Hanford sludge are currently available. The two chemical sludge simulants will be used in follow-on work to experimentally measure the van Kessel and van Kesteren model parameters, lateral earth pressure at rest, and shear modulus.
Date: March 23, 2010
Creator: Wells, Beric E.; Russell, Renee L.; Mahoney, Lenna A.; Brown, Garrett N.; Rinehart, Donald E.; Buchmiller, William C. et al.
Object Type: Report
System: The UNT Digital Library