Until recently, performance improvement was not difficult. Processors just kept getting faster. Waiting a year for the customer's hardware to be upgraded was a valid optimization strategy. Nowadays, however, individual processors don't get much faster; systems just get more of them.
Much comment has been made on coding paradigms to target multiple-processor cores, but the data-parallel paradigm is a newer approach that may just turn out to be easier to code to, and easier for processor manufacturers to implement.