Musical genres are categorized by human. It depends on human hearing. There are common characteristics shared by categories. These characteristics are related to instrumentation, rhythmic structure, and harmonic content of the music. Currently many music is still classified by manually. Automated system for musical genre classification can assist or replace manual work for classifying musical genre. In this paper, the automatic classification of audio signals into hierarchy of musical genres is explored.
Three feature sets for representing umbrae texture, rhythmic content and pitch content are proposed. Also propose classification through two-times ANN. classification method and show enhancement of accuracy. Using two-time ANN. classification method increases accuracy about 5% than one-time –++++ANN. classification which two-time ANN. classification accuracy is 77. 9% and one-time ANN. classification accuracy is 73. 3%. Index Terms – Music classification, feature extraction, wavelets, ANN. classification Table of Contents l. II. Introduction Music Modeling & Genre Segmentation Ill.
Feature Extraction A. Timbres Texture Features I. Lie. ;v. B. Spectral shape features Mel-frequency spectral coefficients (MFC) Texture window Low-Energy features Rhythmic Features C. Pitch Content Features IV. Classification V. Evaluation and Discussion VI. References common characteristics shared by categories. These characteristics are related to instrumentation, rhythmic structure, and harmonic content of the music. Genre classification is magnified when music industry moved from CD to web. In web music is distributed in large amount so importance of genre classification is magnified.
Currently many music is still classified by manually. Automated system for musical genre classification can assist or replace manual work for classifying musical genre. In era of web, it enabled to access large amount of all kinds of data such as music, movies, news and so on. Music database has been grown exponentially since first perceptual coders early in the ass’s. As database grows it demanded tools that can enable search, retrieve and handle large amount of data. Classifying musical genre was great tool for searching, retrieving and handling large music database [1-3].
There are several more method such as music emotion classification , beat racking , preference recommendation , and etc.. Musical genres classification (MGM) are created and used for categorized and describe music. Musical genre has no precise definitions or boundaries because it is categorized by human hearing. Musical genres classification are highly related to public marketing, historical and cultural factors. Different countries and organizations have different genre lists, and they even define the same genre with different definitions.
So it is hard to define certain genres precisely. There is not an official specification of music genre until now. There are about 500 to 800 genres in music [7, 8]. Some researchers suggested the definition of musical genres classification . After several attempt to define musical genres researchers figured out that it shares certain characteristics such as instrumentation, rhythmic structure, and pitch content. Genre hierarchies were created by human experts and they are currently used to classify music in the web.
Auto MGM can provide automating classifying process and provide important component for complete music information. The most significant proposal to specifically deal with this task was leased in 2002 . Several strategies dealing with related problems have been proposed in research areas. In this paper, automatic musical genre classification is proposed showed in Figure 1 . For feature extraction, three sets of features for representing instrumentation (timbered), rhythmic content and pitch content are proposed. Figure 1 Automatic Musical Genre Classification II.
Music Modeling & Genre Segmentation An untrained and non-expert person can detect the genre of a song with accuracy of 72% by hearing three-second segmentation of the song . However computer is to design like human brain so it can’t process MGM like human. Despite whole song may somehow influence the representatives of feature, using whole song can extract most of features that music has. Also to extract short segment of music for automation system is unsuited for the purpose because difficulty of finding exact section of music representing its characteristic using whole song to modeling is proper way to MGM.
There are too many music genres used in web [7, 8]. Classification genre has to be simplified and in this paper proposed genres which are popular used in MPH players in the market. Figure 2 Taxonomy of Music Genre Ill. Feature Extraction Feature extraction is the process of computing numerical representation that can be used to characterize segment of audio and classify its genre. Digital music file contains data sampled from analog audio signal. It has huge data size compared to its actual information. Features are thus extracted from audio signal to obtain more meaningful information and reduce the over-loading processing.
For feature extraction three sets of features for representing instrumentation (timbered), rhythmic content and pitch content will be used . 1 . Timbres Texture Features The features used to represent timbre texture are based on the features proposed in speech recognition. The following specific features are usually used to represent timbre texture. @ Spectral shape features [1-3] Spectral shape features are computed directly from the power spectrum of an audio signal frame, describing the shape and characteristics of the power spectrum.
The calculated features are based on the short time Fourier transform (STET) and are calculated for every short-time frame of sound. There are several ways to extract feature with spectral shape feature. 1 . Spectral centered is centered of the magnitude spectrum of STFW and its measure of spectral brightness. Cot Trier n : Frequency bin, M t Non: Magnitude of the Fourier Transform 2. Spectral Roll-off is the frequency below which 85% of the magnitude distribution is concentrated. It measures the spectral shape. N 01 n 01 3.
Spectral flux is the squared difference between the normalized magnitudes of successive spectral distributions. It measures the amount of local spectral change. N 01 2 N t Non : Normalized Magnitude of the Fourier Transform 4. Time domain zero crossing is measure of the noisiness of the signal. Higher value represents more noisy data. Zit 1 N O 2 noel @ Mel-frequency spectral coefficients (MFC) [1 1] MFC are considered as a set of dominant feature in speech recognition and are mostly used in music signal processing.
Figure 3 Flow chart of MFC MFC are independent of the pitch and tone of the audio signal, and thus can be an excellent feature set for speech recognition and audio processing. Log energy of the signal frame and coefficients of spectrum, that is, 13-dimension feature set is the basic MFC for an audio signal frame. @ Texture window [1, 2] All timbre features mentioned above are computed within a small frame (about 10 – 60 ms) over a whole audio signal, that is, a song is broken into many small frames and timbre features of each frame are computed.
However, in order to capture the long-term variation of the signal, so called “texture”, the actual features classified in automatic system is the running means or variation of the extracted feature described above over a number of small frames. Texture window is the term used to describe this larger window. For example, in the system of , a small frame of 23 ms (512 samples at 22 050 Hz’s impaling rate) and a texture window of 1 s (43 analysis windows) is used. @ Low- Energy feature is mostly used. It measures the percentage of frames that have root mean square (ARMS) energy less than the average ARMS energy over the whole signal.
It measures amplitude distribution of the signal. For example, vocal music with silences has large low-energy value while continuous strings have smaller low-energy value. 2. Rhythmic Features  Rhythmic features describe the periodicity of audio signal. Discrete Wavelet Transform Octave Frequency Bands Envelope Extraction Full wave rectification Envelope Low pass filtering Extraction Down sampling Mean removal Autocorrelation Multiple peak picking Beat Histogram Figure 4 Beat histogram calculation flow diagram Tempo induction is used to measure the number of beats per minute and the interpret interval.
Beat tracking uses band-pass filters and comb filters to extract the beat from, musical signals of arbitrary musical structure and containing arbitrary timbres. The simplest method is calculating the beat histogram. Figure 5 Examples of beat histogram  In Figure 5. Rock and hip-hop contain higher BPML with stronger strength that those of lassie and Jazz music, The histogram is intuitive since that the rhythm of rock and hip-hop music are bouncy while classical and Jazz music are gentle. Therefore, beat tracking is a good feature for genre classification. Melody is the term used to depict the pattern of music.
Features exploited to measure the melody includes histogram of audio signal, peak detection, pitch, autocorrelation in temporal and frequency domain, and zero-crossing in time domain. 3. Pitch Content Features 12 ‘DAFT ADOPTION O O ‘DAFT Outfought k determines the frequency domain compression. The pitch content feature set is based on multiple pitch detection techniques. More specifically, the multiplicity detection algorithm described by Tolkien and Jardinière  is utilized. IV. Classification With features extracted by methods above classify music genre with a more standardized manner.
When features of music are extracted, there is high dimensional feature space to be classified. Data-mining algorithms classify the space with unsupervised or supervised approaches. In this paper, classification is done by supervised approach which has been studied more extensively. The system designed by supervised approaches is trained by manually labeled data at first, that is, supervised approach knows the genres of songs. When unlabeled data (new coming data) comes, the trained system is used to classify it into a known genre.
K-Nearest Neighbor (ANN.) is a supervised classifying algorithm where the result of new coming data is classified based on majority of K-nearest neighbor category . Figure 7 ANN. In Figure 7 data points with known genres (red, green, and blue) are scattered in the high-dimension feature space. When new songs that has to be classified enters torture space (marked star in Figure 7), decide number of sample to compare with star. The distance between positions is commonly measured by equation (1), which is the Minnows metric. 1) The most widely used distance metric for continuous features is the Euclidean distance, which is commonly used to calculate the distance between objects in housecleaning space. The Euclidean distance is a special case of the Minnows (2) Setting k = 5 that takes five samples nearest from star. As in Figure 7 four neighbors are blue genre, one is red, and one is green, so the genre of the new coming song is classified as blue. V. Evaluation and Discussion MGM has several problems. As boundaries of music genres are ambiguous, and a song may involve several genre styles.
This indicates that genre classification not easy. Problem with fuzzy boundaries occur not only for machines but also for humans. Also using supervised classification approach, database set can be key variables of classification. Results differ from which database set used to MGM. In this paper, database set made by at least 7 songs for each category. In this paper whole file classification has been used to MGM. It takes much more time to process than ell-time frame classification but it has advantage in accuracy and it can avoid data distortion. Using ANN. method twice may reduce errors.
Figure 8 shows taxonomy of Music Genre. In first ANN. classification ANN. classify larger group of music genre such as Classic, Jazz, Rock, R/Hip-Hop, and Pop. Figure 8 Taxonomy of Music Genre For second time ANN. classification ANN. method is used inside large group of genre. For example if new song in database sorted to classic in first time ANN. classification then in second time ANN. classification it finds places to go in Orchestral, Ensemble, and Voice (Vocal). During 2nd ANN. classification new song cannot move to classic to jazz or pop or other genres.