Off-chip replacement (capacity and conflict) and coherent read misses in a distributed shared memory system cause execution to stall for hundreds of cycles. These off-chip replacement and coherent read misses are recurring and forming sequences of two or more misses called streams. Prior streaming techniques ignored reordering of misses and not-recently-accessed streams while streaming data. In this paper, we present stream prefetcher design that can deal with both problems. Our stream prefetcher design utilizes stream waiting rooms to store not-recently-accessed streams. Stream waiting rooms help remove more off-chip misses. Using trace based simulation% our stream prefetcher design can remove 8% to 66% (on average 40%) and 17% to 63% (on average 39%) replacement and coherent read misses, respectively. Using cycle-accurate full-system simulation, our design gives speedups from 1.00 to 1.17 of princeton application repository for shared-memory computers (PARSEC) workloads running on a distributed shared memory system with the exception of dedup and swaptions workloads.
Hierarchical art was used to solve the mixed mode placement for three dimensional (3-D) integrated circuit design. The 3-D placement flow stream includes hierarchical clustering, hierarchical 3-D floorplanning, vertical via mapping, and recursive two dimensional (2-D) global/detailed placement phases. With state-of-the-art clustering and de-clustering phases, the design complexity was reduced to enhance the placement algorithm efficiency and capacity. The 3-D floorplanning phase solved the layer assignment problem and controlled the number of vertical vias. The vertical via mapping transformed the 3-D placement problem to a set of 2-D placement sub-problems, which not only simplifies the original 3-D placement problem, but also generates the vertical via assignment solution for the routing phase. The design optimizes both the wire length and the thermal load in the floorplan and placement phases to improve the performance and reliability of 3-D integrate circuits. Experiments on IBM benchmarks show that the total wire length is reduced from 15% to 35% relative to 2-D placement with two to four stacked layers, with the number of vertical vias minimized to satisfy a pre-defined upper bound constraint. The maximum temperature is reduced by 16% with two-stage optimization on four stacked layers.
Traffic classification is critical to effective network management. However, more and more pro- prietary, encrypted, and dynamic protocols make traditional traffic classification methods less effective. A Message and Command Correlation (MCC) method was developed to identify interactive protocols (such as P2P file sharing protocols and Instant Messaging (IM) protocols) by session analyses. Unlike traditional packet-based classification approaches, this method exploits application session information by clustering packets into application messages which are used for further classification. The efficacy and accuracy of the MCC method was evaluated with real world traffic, including P2P file sharing protocols Thunder and Bit- Torrent, and IM protocols QQ and GTalk. The tests show that the false positive rate is less than 3% and the false negative rate is below 8%, and that MCC only needs to check 8.7% of the packets or 0.9% of the traffic. Therefore, this approach has great potential for accurately and quickly discovering new types of interactive application protocols.