Telephone systems have a low dynamic range, because they are designed for human speakers at an, essentially, fixed distance from the microphone. That means that telephone systems operate just below the clipping level.
If you are playing HiFi classical music off the MP3, this will have a high dynamic range, which means that most of time it will be far below the clipping level, but may reach the clipping level at times.
You should use dynamic range compression, as is done on AM radio stations, on your MP3s, if this is the problem you are having.
Also, it does not make sense to to play MP3’s on the fly, as they are expensive to convert to telephone friendly formats, are are of far too high audio quality, so when running the dynamic range compression, also store the output as 8kHz signed linear (for mu-law and A-law), or using the actual codec (although most codecs are not good for music) if using other codecs.
This might be a starting point: https://ubuntuforums.org/showthread.php?t=908758