I would just work in Maya first. Create the 3D background using a still shot from your 2D anim as reference, then render out each 3D frame with alpha channels(minus your reference image), then composite them in either After Effects for animation or Photoshop for a still. This would be fine for your animation if the background isn't moving....from the little animation clip's camera view, it seems only the character is moving.
As far as getting rid of the white. You'd save your original 2D animation (each image) with alpha channels first. Then in Maya, create a plane and in hypershade, create a material (blinn), and map its color to your image with the alpha. This might help you out a little. I'm not sure how you would do an entire animation, though.
If it were me, I'd bring one image in for reference, then build my background in maya, then render out the background with alpha channel, and composite in your program of choice.
Hope this helped some.
"Terminat Bora Diem, Terminal Auctor opus."