[SOLVED] Swift Merge audio and video files into one video


This Content is from Stack Overflow. Question asked by Sanzhar

I created the application for merging video with audio and trying merge them. As audio i mean Data type with path in user’s device.

I can play my audio via AVAudioPlayer but everytime when I’m trying to merge video with my audio it doesn’t work because AVAsset doesn’t have an AVTrack, possible way i found in this question: Swift Merge audio and video files into one video

In result I get video without sound, can you help me to write some function which will get params (video: AVAsset, audio: Data) and answer for this question? please


Improved code (of Govind’s answer) with some additional features:

  1. Merge audio of the video + external audio (the initial answer was dropping the sound of the video)
  2. Flip video horizontally if needed (I personally use it when user captures using frontal camera, btw instagram flips it too)
  3. Apply preferredTransform correctly which solves the issue when video was saved rotated (video is external: captured by other device/generated by other app)
  4. Removed some unused code with VideoComposition.
  5. Added a completion handler to the method so that it can be called from a different class.
  6. Update to Swift 4.

Step 1.

import UIKit
import AVFoundation
import AVKit
import AssetsLibrary

Step 2.

/// Merges video and sound while keeping sound of the video too
/// - Parameters:
///   - videoUrl: URL to video file
///   - audioUrl: URL to audio file
///   - shouldFlipHorizontally: pass True if video was recorded using frontal camera otherwise pass False
///   - completion: completion of saving: error or url with final video
func mergeVideoAndAudio(videoUrl: URL,
                        audioUrl: URL,
                        shouldFlipHorizontally: Bool = false,
                        completion: @escaping (_ error: Error?, _ url: URL?) -> Void) {

    let mixComposition = AVMutableComposition()
    var mutableCompositionVideoTrack = [AVMutableCompositionTrack]()
    var mutableCompositionAudioTrack = [AVMutableCompositionTrack]()
    var mutableCompositionAudioOfVideoTrack = [AVMutableCompositionTrack]()

    //start merge

    let aVideoAsset = AVAsset(url: videoUrl)
    let aAudioAsset = AVAsset(url: audioUrl)

    let compositionAddVideo = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
                                                                   preferredTrackID: kCMPersistentTrackID_Invalid)

    let compositionAddAudio = mixComposition.addMutableTrack(withMediaType: AVMediaTypeAudio,
                                                                 preferredTrackID: kCMPersistentTrackID_Invalid)

    let compositionAddAudioOfVideo = mixComposition.addMutableTrack(withMediaType: AVMediaTypeAudio,
                                                                        preferredTrackID: kCMPersistentTrackID_Invalid)

    let aVideoAssetTrack: AVAssetTrack = aVideoAsset.tracks(withMediaType: AVMediaTypeVideo)[0]
    let aAudioOfVideoAssetTrack: AVAssetTrack? = aVideoAsset.tracks(withMediaType: AVMediaTypeAudio).first
    let aAudioAssetTrack: AVAssetTrack = aAudioAsset.tracks(withMediaType: AVMediaTypeAudio)[0]

    // Default must have tranformation
    compositionAddVideo.preferredTransform = aVideoAssetTrack.preferredTransform

    if shouldFlipHorizontally {
        // Flip video horizontally
        var frontalTransform: CGAffineTransform = CGAffineTransform(scaleX: -1.0, y: 1.0)
        frontalTransform = frontalTransform.translatedBy(x: -aVideoAssetTrack.naturalSize.width, y: 0.0)
        frontalTransform = frontalTransform.translatedBy(x: 0.0, y: -aVideoAssetTrack.naturalSize.width)
        compositionAddVideo.preferredTransform = frontalTransform


    do {
        try mutableCompositionVideoTrack[0].insertTimeRange(CMTimeRangeMake(kCMTimeZero,
                                                            of: aVideoAssetTrack,
                                                            at: kCMTimeZero)

        //In my case my audio file is longer then video file so i took videoAsset duration
        //instead of audioAsset duration
        try mutableCompositionAudioTrack[0].insertTimeRange(CMTimeRangeMake(kCMTimeZero,
                                                            of: aAudioAssetTrack,
                                                            at: kCMTimeZero)

        // adding audio (of the video if exists) asset to the final composition
        if let aAudioOfVideoAssetTrack = aAudioOfVideoAssetTrack {
            try mutableCompositionAudioOfVideoTrack[0].insertTimeRange(CMTimeRangeMake(kCMTimeZero,
                                                                       of: aAudioOfVideoAssetTrack,
                                                                       at: kCMTimeZero)
    } catch {

    // Exporting
    let savePathUrl: URL = URL(fileURLWithPath: NSHomeDirectory() + "/Documents/newVideo.mp4")
    do { // delete old video
        try FileManager.default.removeItem(at: savePathUrl)
    } catch { print(error.localizedDescription) }

    let assetExport: AVAssetExportSession = AVAssetExportSession(asset: mixComposition, presetName: AVAssetExportPresetHighestQuality)!
    assetExport.outputFileType = AVFileTypeMPEG4
    assetExport.outputURL = savePathUrl
    assetExport.shouldOptimizeForNetworkUse = true

    assetExport.exportAsynchronously { () -> Void in
        switch assetExport.status {
        case AVAssetExportSessionStatus.completed:
            completion(nil, savePathUrl)
        case AVAssetExportSessionStatus.failed:
            print("failed \(assetExport.error?.localizedDescription ?? "error nil")")
            completion(assetExport.error, nil)
        case AVAssetExportSessionStatus.cancelled:
            print("cancelled \(assetExport.error?.localizedDescription ?? "error nil")")
            completion(assetExport.error, nil)
            completion(assetExport.error, nil)


Again thanks to @Govind’s answer! It helped me a lot!

Hope this update helps someone too:)

This Question was asked in StackOverflow by Kei Maejima and Answered by Tung Fam It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?