专利摘要:
computing system and device and method of audio separation with frame sizes applied by codec and non-transient computer-readable storage medium describes a method and equipment for separating audio from media content in separate content files without introducing artifacts from contour.
公开号:BR112012014872B1
申请号:R112012014872-9
申请日:2010-12-21
公开日:2020-12-15
发明作者:Calvin Ryan Owen
申请人:Dish Digital L.L.C;
IPC主号:
专利说明:

TECHNICAL FIELD
[0001] The modalities of the invention refer to the field of release of media content over the Internet and, more specifically, to the division of audio from media content into separate content files without the introduction of border artifacts. BACKGROUND
[0002] The Internet is becoming a primary method of distributing media content (for example, audio and video or audio) and other information to end users. Today it is possible to download music, videos, games and other media information to computers, cell phones and virtually any network capable device. The percentage of people accessing the Internet for media content is growing rapidly. The quality of the viewer's experience is an important barrier to the growth of online video viewing. Consumers' expectations for online video are defined by their experience of watching films and television.
[0003] Audience numbers for video transfer on the web are growing rapidly and there is a growing interest and demand for viewing videos on the internet. Data file transfer or “media transfer” refers to technology that delivers sequential media content at a rate sufficient to present the media to a user at the originally intended playback speed, without significant interruption. Unlike data downloaded from a media file, transfer data can be stored in memory until the data is played and later deleted after a certain period of time has passed.
[0004] Media transfer content over the Internet has some challenges, compared to regular broadcasts over the air, satellite or cable. A concern that arises in the context of audio coding of media content is the introduction of boundary artifacts when segmenting video and audio into fixed parts in time. In a conventional approach, the audio is segmented into parts having a fixed time duration, which coincides with the fixed time duration of the corresponding video, for example, two seconds. In this approach, the audio boundaries are always aligned with the video boundaries. The conventional approach starts a new session of encoding an audio codec to encode each audio part for each content file, for example, using Advanced Audio Encoding - Low Complexity (AAC LC). When using a new encoding session for each audio part, the audio codec interprets the beginning and end of the waveform as zero transitions, resulting in a clicking or popping noise in the reproduction of the encoded part at the boundaries of the portion, as illustrated in Figure 1. Click or pop noises are referred to as border artifacts. In addition, the audio codec encodes the fixed time duration audio according to a forced frame size per codec. This also introduces boundary artifacts when the number of samples produced by the audio codec is not uniformly divisible by the frame size forced by codec.
[0005] Figure 1 is a diagram illustrating an exemplary audio waveform 100 for two parts of audio using a conventional approach. The audio waveform 100 illustrates the transition from zero 102 between the first and second video parts. When the audio codec has a fixed frame size (here referred to as a forced codec frame size), the encoded audio requires that the last frame 104 be padded with zeros when the part's sample number is not evenly divisible by the number of samples per frame according to the forced frame size per codec. For example, when using a sampling rate of 48 kHz, there are 96,000 samples generated for a two-second audio segment. When dividing the number of samples, 96,000, by the number of samples per frame (for example, 1024 samples for AAC LC and 2048 samples for High Efficiency AAC (HE AAC)), the result is 93.75 frames. Since the number 93.75 is not an integer, the audio codec fills the last frame 104 with zeros. In this example, the last 256 samples from the last frame are given a zero value. Although the zero values represent silent audio, filling the last frame with zero results in a click or pop noise when playing the encoded audio portion at the boundary of the portion. The zero transitions 102 and the zeros filled in the last frames 104 introduce boundary artifacts. The introduction of border artifacts can decrease the overall audio quality, affecting the user experience when playing media content.
[0006] Another conventional approach attempts to limit the number of border artifacts using audio parts having a longer duration, in order to align with the borders of the frame. However, when using a longer audio portion, audio and video may be required to be packaged separately. This can present a drawback for the transfer of media content having audio and video, especially when the same media content is encoded at different levels of quality, for example, as used in the context of adaptive transfer, which allows displacement between the different levels of quality when playing media content. BRIEF DESCRIPTION OF THE DRAWINGS
[0007] The invention can be better understood by reference to the following description and drawings that trace that are used to illustrate the modalities of the invention. In the drawings:
[0008] Figure 1 is a diagram illustrating an exemplary audio waveform for two parts of audio using a conventional approach.
[0009] Figure 2 is a schematic block diagram that illustrates a modality of a computing environment in which a coder of the modalities of the present invention can be employed.
[0010] Figure 3A is a schematic block diagram that illustrates Another modality of a computing environment in which an encoding system can be employed, including multiple hosts each employing the encoder in Figure 2.
[0011] Figure 3B is a schematic block diagram that illustrates a modality of parallel coding of streamlets according to a modality.
[0012] Figure 4 is a flow diagram of a modality of an audio encoding method of media content according to forced frame sizes per codec to split complete audio frames between content files having part of time video. media content.
[0013] Figures 5A-5C are flow diagrams of a mode of generating content files with fixed-time video parts and complete audio frames having forced frame sizes per codec.
[0014] Figure 6A is a diagrammatic representation of audio parts, video parts and streamlets according to an audio division modality.
[0015] Figure 6B is a diagram that illustrates an audio waveform modality for four parts of audio using audio splitting.
[0016] Figure 7 illustrates a schematic representation of a machine in the exemplary form of a computer system for dividing audio according to a modality. DETAILED DESCRIPTION
[0017] A method and device for dividing the audio of media content into separate content files without the introduction of border artifacts are described. In one embodiment, a method, implemented by a computer system programmed to perform operations, includes receiving media content, including audio and video, encoding the video according to a frame rate, encoding the audio according to a size of forced codec frame (ie, fixed frame size) and generate content files, each content file includes a fixed time video encoded part and an audio encoded part having full size audio frames forced frame by codec. In one embodiment, the last of the audio frames is not padded with zeros as done conventionally.
[0018] The modalities of the present invention provide an improved approach to audio transfer. Unlike conventional approaches that use a new coding session for each audio portion of the media content, the modalities described here allow the media content to be divided into small portions, without introducing border artifacts. The modalities described here segment the audio using full audio frames. When the audio is tested for playback, the audio is presented to the decoder as a single stream, instead of many small segments having boundary artifacts. In the modalities described here, the encoder becomes aware of the codec's frame size (for example, 1024 samples for AAC-LC or 2048 samples for HE AAC) and how many audio frames are produced with each invocation of the codec. The encoder storage like many audio frames that can fit into an encoded streamlet (that is, a content file), which has a portion of the video based on a fixed length of time. Instead of filling the last audio frame with zeros, a complete frame of the next audio part is encoded and added to the stream stream. This results in a small amount of audio that would otherwise be in the subsequent streamlet being written instead of the current streamlet. The subsequent streamlet is then determined with a time offset for the audio stream to indicate a gap, so that the audio can be presented to the decoder as a continuous stream when played. This same amount of time is deducted from the target duration of the audio for this streamlet. If the end of the audio for this subsequent streamlet does not fall on a boundary of the frame, then the audio is again borrowed from the subsequent streamlet to fill the final frame. This process is repeated until the end of the media content flow is reached. Gaps inserted at the beginning of streamlets where audio is borrowed can be eliminated when the audio parts of the streamlets are tested before decoding and playback. When looking for a random streamlet, silent audio can be played for the duration of the gap in order to maintain audio / video synchronization.
[0019] The audio splitting modalities as described here provide the ability to encode the audio of the media content using audio codecs with large frame sizes forced by codecs (AAC, AC3, etc.) without the introduction of border artifacts while maintaining still the same fixed time duration for the video.
[0020] In the following description, numerous details are presented. It will be apparent, however, to a person skilled in the art having the benefit of this disclosure, that modalities of the present invention can be practiced without these specific details. In some cases, well-known structures and devices are shown in the form of a block diagram, rather than in detail, in order to avoid obscuring the modalities of the present invention.
[0021] Some parts of the detailed description that follows are presented in terms of algorithms and symbolic representations of operations in bits of data within a computer memory. These algorithmic descriptions and representations are the means used by experts in data processing techniques to more effectively convey the substance of their work to other experts in the art. An algorithm is here and generally designed to be a self-consistent sequence of steps that lead to a desired result. The steps are those that require physical manipulation of physical quantities. Generally, although not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared and otherwise manipulated. It has sometimes proved convenient, mainly for reasons of common use, to refer to these signs as bits, values, elements, symbols, characters, terms, numbers or the like.
[0022] It should be borne in mind, however, that all these similar terms must be associated with the appropriate physical quantities and are merely convenient markings applied to these quantities. Unless otherwise specified, as is clear from the discussion below, discussions using terms such as "receive", "code", "generate", "divide", "process", "compute" are considered throughout the description. , “Calculate”, “determine”, “display” or similar, refer to the actions and processes of a computer system or similar electronic computing systems, which manipulate and transform the data represented as physical quantities (for example, electronic) within the records and memories of the computer system in other data equally represented as physical quantities within the records or memories of the computer system or other information storage, transmission or display devices.
[0023] The modalities of the present invention also concern a device for performing operations described herein. This device can be specially built for the required purposes or it can comprise a general purpose computer system specifically programmed by a computer program stored in the computer system. Such a computer program may be stored on a computer-readable storage medium, such as, but without limitation, any type of disk, including floppy disks, optical disks, CD-ROMs and magnetic optical disks, ROM memories, memories random access (RAM) and EPROMs, EEPROMs, magnetic or optical cards or any type of media suitable for storing electronic instructions.
[0024] The term “encoded streamlet”, as used here, refers to a single encoded representation of a piece of media content. Each streamlet can be an individual content file that includes a portion of the media and can be encapsulated as an independent media object, allowing the streamlet to be stored individually and to be independently requestable and independently reproducible by a media player. These individual files are also referred to here as QSS files. In one embodiment, a streamlet is a static file that can be served by a non-specialized server, instead of a specialized media server. In one embodiment, the media content in a streamlet can have a predetermined length of playing time (also referred to as the fixed time duration). The predetermined length of time can be in the range of approximately about 0.1 and 8.0 seconds, for example. Alternatively, other predetermined lengths can be used. The media content in the streamlet can have a unique time index from the beginning of the media content contained in a stream. The file name can include part of the time index. Alternatively, streamlets can be divided according to a file size, instead of a time index. The term “stream,” as used here, can refer to a set of streamlets of media content encoded by the same video quality profile, for example, parts of video that have been encoded with the same video bit rate . The stream represents a copy of the original media content. Streamlets can be stored as separate files on any one or more content servers, web servers, cache servers, proxy caches, or other devices on the network, such as those found on a content delivery network (CDN). Separate files (for example, streamlets) can be requested by the customer's device from a web server using HTTP. The use of a standard protocol, such as HTTP, eliminates the need for network administrators to configure firewalls to recognize and pass through network traffic to a new specialized protocol, such as Real Time Transfer Protocol (RTSP). In addition, once the media player initiates the request, a web server, for example, is only asked to retrieve and serve the requested streamlet, not the entire stream. The media player can also retrieve streamlets from more than one web server. These web servers may be clever on the side of the specialized server to retrieve the requested parts. In another embodiment, streamlets are stored as separate files on a cache server of a network infrastructure operator (for example, an ISP), or other components of a CDN. Although some of the present modalities describe the use of streamlets, the modalities described here are not limited to use in computer systems that use streamlets, but can also be implemented in other systems that use other techniques for delivering live media content through from Internet. For example, in Another modality, the media content is stored in a single file that is divided into parts that can be requested using HTTP interval requests and cached in the CDN.
[0025] There are two general types of media transfer, namely, Push-based transfer and Pull-based transfer. Push technology describes a communication method based on the Internet, where the server, like a publisher's content server, initiates the request for a particular transaction. Pull technology, on the other hand, describes a communication method based on the Internet where the request for transmitting information is initiated by the client device and then answered by the server. One type of request in Pull technology is an HTTP request (for example, HTTP GET request). In contrast, in Push-based technology, a specialized server typically uses specialized protocol, such as RTSP to push data to the customer's device. Alternatively, some Push-based technologies may use HTTP to deliver media content. In Pull-based technology, a CDN can be used to deliver the media to multiple customer devices.
[0026] It should be noted that, although several modalities described here are directed to a Pull-based model, the modalities can be implemented in other configurations, such as a Push-based configuration. In the Push-based configuration, the audio splitting modalities by the encoder can be done in a similar way as in the Pull-based configuration described in relation to Figure 2 and the encoded content file (s) can be stored (s) on a content server, such as a media server to deliver media content to the client's device for playback using Push-based technologies. It should also be noted that these modalities can be used to provide different levels of quality for the media content and allow switching between the different levels of quality, commonly referred to as adaptive transfer. A difference may be that, in the Push-based model, the media server determines what contents of the file to send to the client's device, whereas in the Pull-based model, the client's device determines what is the contents of the file (s) (s) request from the content server.
[0027] Figure 2 is a schematic block diagram illustrating one embodiment of a computing environment 200 in which an encoder 220 of the present embodiments can be employed. Computing environment 200 includes a source 205, encoder 220, a source content server 210 (also referred to as a media server or source server) from a content delivery network 240 and media players 200, each operating on a client device 204. Content server 210, encoder 220 and client devices 204 can be coupled over a data communications network. The data communications network may include the Internet. Alternatively, content server 210, encoder 220 and client devices 204 can be located on a common Local Area Network (LAN), personal area network (PAN), Field Area Network (CAN), metropolitan network (MAN ), wide area network (WAN), wireless local area network, cellular network, virtual local area network, or similar. The client device 204 can be a client workstation, a server, a computer, a portable electronic device, an entertainment system configured to communicate over a network, such as a set-top box, a digital receiver, a device digital television or other electronic devices. For example, portable electronic devices may include, but are not limited to, cell phones, portable game systems, portable computing devices or the like. The client device 204 may have access to the Internet through a firewall, a router, or other packet switching devices.
[0028] In the represented modality, source 205 can be a publisher server or a repository of publisher content. Source 205 can be a creator or distributor of media content. For example, if the media content to be transmitted is a broadcast from a television program, the source 205 may be a server for a television or cable network channel such as the ABC® channel or the MTV® channel. The publisher can transfer media content over the Internet to encoder 220, which can be configured to receive and process media content and store the media content content file (s) on the source 210 content server. In this mode, content server 210 delivers media content to client device 204, which is configured to play content on a media player that is operating on client device 204. Content server 210 delivers media content by transferring media content to the client device 204. In an additional embodiment, the client device 204 is configured to receive different pieces of media content from multiple locations simultaneously or concurrently, as described in more detail below.
[0029] Media content stored on content server 210 can be replicated to other web servers or, alternatively, to CDN 240 proxy cache servers. Replication can occur by deliberately forwarding content server 210, or by a web server, cache, or proxy outside content server 210 asking for content on behalf of client device 204. For example, client device 204 can request and receive content from any of several web servers, high end caches , or proxy cache servers. In the represented modality, the web servers, proxy caches, edge caches and content server 210 are organized in a hierarchy of CDN 240 to deliver the media content to the client device 204. The CDN is a system of computers interconnected in network over the Internet that cooperates transparently to deliver content and may include, for example, one or more source content servers, web servers, cache servers, edge servers, etc. Usually, the CDN is configured on a hierarchy so that a client device requests data from a tip cache, for example, and if the tip cache does not contain the requested data, the request is sent to an original cache, and so on to the server source content. The CDN may also include networks or interconnected computer nodes to deliver media content. Some examples of CDNs would be CDNs developed by Akamai Technologies, Leve3 Comunications, or Limelight Networks. Alternatively, other types of CDNs can be used. In other embodiments, the source content server 210 can deliver media content to client devices 204, using other configurations, as would be appreciated by a technician with ordinary skill in the art having the benefit of this disclosure.
[0030] In one embodiment, the editor stores the media content in an original content file to be distributed from source 205. The content file can include data corresponding to the video and / or audio that corresponds to a television broadcast , sporting events, cinema, music, concerts, or the like. The original content file can include uncompressed video and audio, or alternatively uncompressed video or audio. As an alternative, the content file can include compressed content (for example, video and / or audio) using standard or proprietary encoding schemes. The original content file from source 205 can be digital in form and can include media content having a high bit rate, such as, for example, about 5 Mbps or greater.
[0031] In the represented mode, encoder 220 receives the original media content 231 from the source 205, for example, by receiving an original content file, a signal from a direct feed of the Live Event Transmission, a stream of live TV events, or the like. Encoder 220 can be implemented on one or more machines, including one or more server computers, gateways or other computing devices. In one embodiment, encoder 220 receives the original media content 231 as one or more content files from a publishing system (not shown) (for example, publisher server or publisher content repository). Alternatively, encoder 220 can receive the original media content 231, as it is captured. For example, encoder 220 can receive a direct feed from the live TV broadcast, such as a captured broadcast, in the form of a stream or signal. The original media content 231 can be captured by a capture card, configured for television and / or video capture, such as, for example, RDC-2600 capture card, available from Digital Rapids of Ontario, Canada. Alternatively, any capture card capable of capturing audio and video can be used with the present invention. The capture card can be located on the same server as the encoder, or alternatively, on a separate server. The original media content 231 can be a captured broadcast, such as the broadcast that is being broadcast simultaneously over the air, cable and / or satellite, or a pre-recorded broadcast that is scheduled to be played at a specific point in time. according to a live event schedule. Encoder 220 can use encoding schemes, such as DivX® codec, Windows Media Video 9® series codec, Sorenson Video® 3 video codec, On2 Technologies® TrueMotion VP7 codec, MPEG-4 video codecs, video codec H. 263, RealVideo 10 codec, OGG Vorbis, MP3 or similar. Alternatively, a custom coding scheme can be used.
[0032] In another embodiment, encoder 220 receives the original media content 231 as parts of video and audio of fixed time durations, for example, of two-second pieces (here referred to as parts of the media content). The two-second chunks can include raw audio and raw video. Alternatively, the two-second chunks can be raw video and encoded audio. In such cases, encoder 220 decompresses the media content. In another embodiment, encoder 220 receives the original media content 221 as multiple raw streamlets, each raw streamlet containing a fixed time portion of the media content (e.g., multiple two-second raw streamlets containing raw audio and video). As used here, the “raw streamlet” refers to a streamlet that is uncompressed or slightly compressed to substantially reduce its size without significant loss in quality. A lightly compressed raw streamlet can be transmitted more quickly. In another embodiment, encoder 220 receives the original media content 231 or as a stream or signal and segments of the media content in fixed time portions of the media content, such as raw streamlets.
[0033] In the embodiment shown, encoder 220 includes a splitter 222, a fixed frame audio encoder 224, an audio frame buffer 225, a fixed time video encoder 226, a video frame buffer 227 and a audio split multiplexer 228. Splitter 222 receives the original media content 231, for example, as a streaming audio and video and divides media content 231 into raw audio 233 and raw video 235. In one embodiment, the fixed-frame audio encoder 224 is an audio codec. In one embodiment, divider 222 divides the audio and video stream into two-second pieces of audio and video. A codec (also referred to as a compressor-decompressor or encoder-decoder) is a device or computer program capable of encoding and / or decoding a digital signal or data stream. In one embodiment, fixed frame audio codec 224 is executed in software by one or more computing devices from encoder 220 to encode raw audio 233. Alternatively, fixed frame audio codec 224 may be the hardware logic used to encode raw audio 233. In particular, the fixed frame audio encoder 224 receives raw audio 233 and encodes the audio according to a forced frame size per codec, for example, 1024 samples for AAC-LC or 2048 samples by HE AAC. Fixed frame audio encoder 224 produces encoded audio frames 237 for audio frame buffer 225. Likewise, fixed time video encoder 226 receives raw video 235 from splitter 220, but encodes the video according to fixed time durations, for example, 60 frames every two seconds (30 frames per second (fps)). The fixed-time video encoder 226 produces from the encoded video frames 239 to the video frame buffer 227. In one embodiment, the fixed-time video codec 226 is executed in software by one or more computing devices from the encoder 220 to encode raw video 235. Alternatively, fixed-time video codec 226 may be the hardware logic used to encode raw video 235.
[0034] The audio division multiplexer 228 generates encoded media content files 232 (referred to here as QSS files), using encoded audio frames 237 and encoded video frames 239. As described above, the conventional encoder generates an content file with a video part and an audio part, each having a fixed time duration, in which the last audio frame is filled with zeros, because the number of samples of the part is not divisible by the number of samples per frame according to the forced frame size per codec used by the audio codec. Unlike the conventional encoder that fills the last frame, the 228 audio split multiplexer uses full audio frames to generate content files that have a fixed-time video part and an audio part that has full audio frames having the codec-enforced frame sizes. Since the 228 audio split multiplexer uses full audio frames to fill the 232 content files, the 228 audio split multiplexer does not fill the last few samples of the frame as zeros as is conventionally done, but instead encodes a part audio to add a complete frame to the current content file 232.
[0035] In one embodiment, the audio division multiplexer 228 tracks a sample offset that represents the number of samples used from the rear, in order to determine how many frames to use for the subsequent content file. The audio split multiplexer 228 also tracks a presentation offset that indicates a gap in audio reproduction. Since the samples that would have been played as part of the subsequent content file are part of the current content file, offsetting the presentation of the subsequent content file indicates the gap in audio playback so that the audio parts of the Current and subsequent content are presented to the decoder as a continuous stream. In essence, during audio playback, gaps inserted at the beginning of the content files can be eliminated when the audio parts of the content files are tested before decoding and playing. Shifting the presentation allows the audio to be presented to the decoder as a continuous stream instead of many small segments having boundary artifacts. In one mode, when looking for a random part of the video, the silent audio can be played for the duration of the gap in order to maintain audio / video synchronization.
[0036] In one embodiment, the audio division multiplexer 228 generates a first content file by filling the first content file with a first part of video (for example, 60 frames) having a fixed time duration (for example, 2 seconds) and a first audio portion having a number of buffered audio frames. The duration of the buffered audio frames is longer than the fixed time duration.
[0037] In one embodiment, the audio division multiplexer 228 generates content files 232 by determining the number of encoded audio frames 237 necessary to fill the current content file. In one embodiment, the number of frames is the smallest integer that is not less than the number of samples needed to fill the current content files divided by the forced frame size by codec (for example, samples per frame). In one embodiment, this number can be calculated using a ceiling function that maps a real number to the nearest largest integer, for example, ceiling (x) = [x] is the smallest integer not less than x. An example of the ceiling function is represented by the following equation (1): ceiling ((samples per streamlet - displaced samples) / samples per frame) (1) Alternatively, other equations can be used.
[0038] The audio division multiplexer 228 determines whether there are enough audio frames encoded 237 in the audio frame buffer 225 to fill a current content file. If there are enough buffered encoded frames, the 228 audio split multiplexer fills the current content file with the specified number of frames. If there are not enough encoded frames buffered, the audio split multiplexer 228 waits until there are enough encoded frames stored in buffer 225 and fills the current content file with the specified number of encoded frames stored in buffer 225. In one embodiment, the audio split multiplexer 228 determines whether there are enough buffered encoded frames by 1) multiplying the number of buffered frames by the samples per frame, 2) adding a sample offset, if any, from a previous content file for the product of multiplication and 3) determining whether the sum is greater than or equal to a number of samples needed to fill the current content file. An example of this operation is represented by the following equation (2): Number of Frames with Buffer * FrameBySample + Displaced Samples> = amosrtrasPorstreamlet (2)
[0039] The 228 audio division multiplexer determines a sample offset, if any, for a subsequent content file. In one embodiment, the audio division multiplexer 228 determines the sample offset by multiplying the number of encoded frames by the forced frame size per codec (that is, the samples per frame), minus the number of samples needed to fill the image file. current content plus the sample offset, if any, from a previous content file. An example of this operation is represented in the following equations (3) and (4): Displacement of Samples = Frames to send * SamplesBy Frame - SamplesBy streamlet - Displacement of Samples (3) where the number of samples to send = ceiling ((samplesPorstreamlet - Displacement of Samples) / Samples Per Frame) (4)
[0040] In another modality, the audio division multiplexer 228 generates the content files 221 by calculating a number of samples needed (for example, 96,000) to fill a current content file. The audio split multiplexer 228 calculates a number of frames (for example, 93 frames for a sample rate of 48K per two-second pieces), needed for the current content file, and adds a frame for the number of frames (for example, a total of 94 frames) when the number of samples divided by the samples per frame is not equally divisible. In effect, it rounds the number of frames to the nearest largest integer. The 228 audio split multiplexer fills the current content file with the rounded number of frames.
[0041] In another embodiment, the audio division multiplexer 228 generates the content files 221 by calculating the number of samples needed (for example, 96,000) to fill a current content file by multiplying the sample rate (for example , 48K) for the duration of the fixed time duration (for example, 2 sec). The audio split multiplexer 228 calculates the number of frames required for the current content file by dividing the number of samples by the forced frame size per codec (for example, 1024 samples per frame). If the rest of the split is zero, the 228 audio split multiplexer fills the current content file with the number of frames. However, if the rest of the split is greater than zero, the 228 audio split multiplexer increments the number of frames by one and fills the current content file with the incremented number of frames.
[0042] In an additional modality, the audio division multiplexer 228 generates the content files 221 by multiplying the number of frames by the forced frame size per codec to retroconvert to the number of samples needed to fill the current content file and calculating an audio duration of the current content file by dividing the number of samples by sample rate (for example, Destreamlet Duration = samplesPorstreamlet / sample rate). The audio division multiplexer 228 determines a presentation offset of a subsequent content file, subtracting the duration from the fixed time duration. The 228 audio split multiplexer updates the sample offset for the subsequent content file, multiplying the number of frames by the forced frame size per codec minus the number of samples used to fill the current content file and plus the sample offset , if any, from a previous content file (for example, equation (3)).
[0043] Returning to Figure 2, in a modality, when the splitter 222 receives the original media content 231 as raw streamlets, the splitter 222 receives the first and second raw streamlets and divides the audio and video of the first and second raw streamlets . The fixed-time video encoder 226 encodes the video of the first and second raw streamlets and the audio split multiplexer 228 stores the encoded video of the first raw streamlet in a first content file and the encoded video of the second raw streamlet in a second content file. The fixed frame audio encoder 224 encodes the audio from the first raw streamlet into a first set of audio frames and stores the first set in the audio frame buffer 225. The audio split multiplexer 228 determines whether there are enough buffered frames to fill the first content file. If not, the fixed frame audio encoder 224 encodes the audio from the second raw streamlet into a second set of audio frames and stores the second set in the audio frame buffer 225. When there are sufficiently buffered frames (in some cases when a more complete frame is stored in buffer 225) to fill the first content file, the audio split multiplexer 228 stores the buffered audio frames in the first content file. Encoder 220 continues this process until the media content is finished.
[0044] Furthermore, since the 228 audio division multiplexer uses full audio frames, the audio frames in a 232 file content do not necessarily align with the boundaries of the video parts, as illustrated in Figures 6A and 6B. For example, the length of the audio portion of the 232 file's content may be 2.0053 seconds, while the fixed length of the video portion of the 232 file's content may be 2.00 seconds. In this example, the forced frame size per codec is 1024 samples per frame and the audio sample rate is 48K and there are 96256 samples of 94 frames stored in the audio portion stored in the 232 file content. Since there is no extra 53 milliseconds (ms) in the content of the 232 file, the audio split multiplexer 228 gives the next content file a 53 ms presentation offset, since the contents of the current 232 file use samples that would be 53 ms long otherwise in the next content file, when using a fixed time duration audio encoding scheme. The 228 audio split multiplexer also tracks the sample offset to determine how many audio frames are needed to fill the next content file. In one embodiment, the audio split multiplexer 228 fills each of the content files with one of the encoded video parts with a fixed time duration (for example, 2 seconds for 60 video frames, when the frame rate is 30 frames per second). The 228 audio split multiplexer fills some of the content files with a number of buffered audio frames whose duration can be longer than the fixed time duration, less than the fixed time duration, or equal to the fixed time duration , depending on whether the audio frames are aligned with the boundary parts of the video, as determined by the audio division multiplexer 228.
[0045] With reference to Figure 6A, in one embodiment, the audio division multiplexer 228 generates a first streamlet (ie, the file content) 601, filling the first streamlet 601 with a first part of video 611, with about of 60 video frames whose duration is equal to the fixed time duration of two seconds and with a first part of 621 audio having ninety-four audio frames, each having 1024 samples per frame, totaling 96,256 samples. The duration of the first part of audio 621 is about 2.0053 seconds. The audio split multiplexer 228 determines that the display offset of the first audio part 631 of the first streamlet 603 is zero, since the audio and video boundaries 652 and 654 of the first streamlet 601 are aligned for reproduction.
[0046] The audio division multiplexer 228 generates a second streamlet 602, filling the second streamlet 602 with a second part of video 612 (60 frames and two seconds) and with a second part of audio 622 having ninety-four frames of audio . The duration of the second part of audio 622 is about 2.0053 seconds. The audio split multiplexer 228 determines that the display offset of the second audio part 632 of the second streamlet 602 is about 5.3 milliseconds (ms), since the duration of the first audio part 621 of the first streamlet 601 is about 2.0053 seconds. The display offset indicates an audio gap between the first and second streamlets 601 and 602. As shown in Figure 6B, the audio and video boundaries 652 and 654 of the second streamlet 602 are not aligned for playback. The presentation offset can be used to allow the audio portions of the first and second streamlets 601 and 602 to be tested for presentation to the decoder as a continuous stream.
[0047] The audio division multiplexer 228 generates a third streamlet 603, filling the third streamlet 603 with a third part of video 613 (60 frames and two seconds) and with a third part of audio 623 having ninety-four frames of audio . The duration of the third part of audio 623 is about 2.0053 seconds. The audio split multiplexer 228 determines that the display offset of the third audio part 633 of the third streamlet 603 is approximately 10.66 ms, since the duration of the second audio part 622 of the second streamlet 602 is about 2.0053 seconds. The display offset indicates an audio gap between the second and third streamlets 602 and 603. As shown in Figure 6B, the audio and video boundaries 652 and 654 of the third streamlet 603 are not aligned for playback. The presentation offset can be used to allow the audio portions of the second and third streamlets 602 and 603 to be tested for presentation to the decoder as a continuous stream.
[0048] The audio division multiplexer 228 generates a fourth streamlet 604, filling the fourth streamlet 604 with a fourth part of video 614 (60 frames and two seconds) and a fourth part of audio 624 having ninety-three frames of audio . The duration of the fourth part of audio 624 is approximately 1,984 seconds. The audio split multiplexer 228 determines that the display offset of the fourth audio portion 634 of the fourth streamlet 604 is approximately 16 ms, since the duration of the third audio portion 623 of the third streamlet 603 is about 2, 0053 seconds. The display offset indicates an audio gap between the third and fourth streamlets 603 and 604. As shown in Figure 6B, the audio and video boundaries 652 and 654 of the fourth streamlet 603 are not aligned for playback. The presentation offset can be used to allow the audio portions of the third and fourth streamlets 603 and 604 to be tested for presentation to the decoder as a continuous stream. After the fourth streamlet 604, however, the audio and video boundaries 652 and 654 are aligned, that is, the fifth streamlet (not shown) will have a display offset of zero. It should be noted that the modalities of Figures 6A and 6B assume that the sample rate is 48 kHz, the fixed time duration is two seconds and the forced frame size per codec is 1024 samples per frame.
[0049] In the modalities described above, the audio parts of the first three streamlets 601-603 have ninety-four audio frames and the audio part of a fourth streamlet 604 has ninety-three audio frames. In this modality, each of the video parts of the four 601-604 content files has approximately 60 video frames, when the video is encoded at thirty frames per second. This pattern is repeated until the end of the media content is reached. It should be noted that in this modality, after each fourth content file, the presentation offset and sample offset are zero, meaning that the audio borders 652 and the video borders 654 align after each fourth content file.
[0050] As can be seen in Figure 6B, after eight seconds of media content, the video and audio boundaries align. As such, another approach to decreasing the frequency of the fleet artifact and to align the AAC frame sizes would be to use eight seconds for the fixed time duration. However, this approach has the following disadvantages: 1) This approach requires sizes of large pieces of video, such as 8, 16 or 32 seconds. 2) This approach ties the implementation to a specific frame size, that is, 1024 samples per frame. If the frame size were to change, as for 2048, for example, this approach would have to switch to an audio codec with a different frame size and it would also have to change the length of the video piece. 3) This approach requires that the audio sample rate is always 48 kHz. Other common sample rates, such as 44.1 kHz, would require a different and potentially much larger piece size. Alternatively, the audio source would have to be sampled at 48kHz. Over-sampling, however, can introduce artifacts and can reduce the efficiency of the audio codec. The modalities described here, however, have the ability to encode using the audio codec with large frame sizes (AAC, AC3 etc.) without introducing chunk border artifacts, while maintaining the same chunk duration.
[0051] Alternatively, the other oversampling rates (for example, 44.1 kHz), in fixed time durations (for example, 0.1 - 5.0 seconds), video frame rates (for example, 24 fps, 30fps etc.), and / or codec-enforced frame sizes (for example, 2048) can be used. Videos from different sources use different frame rates. Most air signals in the US are 30 frames per second (29.97, in fact). Some HD signals are 60 frames per second (59.94). Some file-based content is 24 frames per second. In one embodiment, encoder 220 does not increase the video frame rate, as this would require encoder 220 to generate additional frames. However, the generation of additional staff does not provide many benefits for this additional position. So, for example, if the original media content has a frame rate of 24 fps, encoder 220 uses a frame rate of 24 fps, instead of oversampling to 30 fps. However, in some embodiments, encoder 220 may sub-sample the frame rate. For example, if the original media content has a frame rate of 60 fps, encoder 220 can sub-sample to 30 fps. This can be done because using 60 fps doubles the amount of data needed to be encoded at the target bit rate, which can make the quality worse. In one embodiment, since encoder 220 determines the frame rate that will be received or after subsampling (usually 30 fps or 24 fps), encoder 220 uses this frame rate for most quality profiles. Some of the quality profiles, such as the lowest quality profile, can use a lower frame rate. However, in other embodiments, encoder 220 may use different frame rates for different quality profiles, such as to target cell phones and other devices with limited resources, such as less computational power. In these cases, it may be advantageous to have more profiles with lower frame rates.
[0052] It should be noted that when using other values for these parameters, the audio borders 652 and the video borders 654 can be different from the illustrated modality of Figure 6B. For example, when using a 44.1 kHz sample rate, forced frame size per 1024 codec and two seconds for the fixed time duration, the audio portion of the first content file will have eighty-seven audio frames and the second to seventh content files will have eight and six audio frames. This pattern is repeated until there is not enough video remaining in the media content. It should be noted that in this modality, after every 128 content files, the presentation offset and the sample offset are zero, meaning that the audio borders 652 and the video borders 654 align after each 128th file of content, as shown in the abbreviated Table 1-1. Table 1-1

It should be noted that the sample offset in the table above is illustrated in units of samples, not in seconds or milliseconds for ease of illustration. To convert the sample offset to the presentation offset, the sample offset can be divided by 44,100 to obtain the presentation offset in seconds and multiplied by 1000 to obtain the presentation offset in milliseconds. In one embodiment, the display offset in milliseconds can be stored in the streamlet header. Alternatively, the display offset or sample offset can be stored in the streamlet header in other units.
[0053] In another embodiment, the audio division multiplexer 228 generates the encoded content files 232, filling each of the content files 232 with the encoded video frames 239 having a fixed time duration (for example, a part of duration time files) and fills the content files 232 with a number of complete audio frames 237 with the duration of audio frames 237 being less than or greater than the duration of fixed time to accommodate the complete audio frames being used in content files 232. For example, a first content file can be filled with a portion of the video having a fixed time duration, such as two seconds and with an audio portion having multiple complete audio frames with a duration which is longer than the fixed time duration. Eventually, the sample offset will be large enough that fewer audio frames can be used, in which case the duration of the audio frames can be less than the fixed time duration. Sometimes the audio boundary of the audio can match the video boundary of the video.
[0054] In another modality, the audio division multiplexer 228 generates the encoded content files 232 by generating a first content file having the video frames of a first part of video and audio frames of the first part of audio and an audio frame from a second portion. The audio split multiplexer 228 generates a second content file containing the video frames from a second part of the video. For audio, the audio split multiplexer 228 determines whether the audio boundary falls over the video boundary. If the audio boundary falls over the video boundary, the audio split multiplexer 228 fills the second content file with the remaining audio frames from the second portion. However, if the audio boundary does not fall over the video boundary, the 228 audio split multiplexer encodes an audio frame from a third part of the media content and fills the second content file with the remaining audio frames from second part and the audio frame from the third portion. This process is repeated until the end of the media content is reached.
[0055] Returning to Figure 2, since the encoder 220 encodes the original media content 231, the encoder 220 sends the encoded media content files 232 to the source content server 210, which delivers the encoded media content 232 for media player 200 over network connections 241. When a media player 200 receives the content files having the fixed time duration of the video and the variable time duration of audio, the media player 200 uses the offset of presentation of the content files to test the audio to be presented to a decoder as a continuous stream, eliminating or reducing the click or pop noises presented by the border artifacts. In essence, during audio playback, the media player 200 removes the gaps inserted at the beginning of the content files when the audio parts of the content files are tested before decoding and playing. In another embodiment, if the audio splitting, as described here, is not performed and the last frame is filled with zeros, the media player 200 can be configured to remove the filled samples from the last frame before sending the audio to the decoder. However, this approach may not be practical in certain situations, for example, when the media player is provided by a third party or when access to audio frame data after decoding is restricted.
[0056] It should be noted that, although one line has been illustrated for each media player 200, each line 241 can represent multiple network connections for CDN 240. In one embodiment, each media player 200 can establish multiple network connections Transport Control Protocol (TCP) with CDN 240. In another embodiment, media content is stored on multiple CDNs, for example, stored on the source servers associated with each of the multiple CDNs. CDN 240 can be used with purpose of improving performance, scalability and cost efficiency for end users (for example, viewers), by reducing bandwidth costs and increasing the global availability of content. CDNs can be implemented in a variety of ways and details regarding their operation would be appreciated by a technician with normal knowledge of the technique. As such, additional details about its operation have not been included. In other modalities, other delivery techniques can be used to deliver media content to media players from the source servers, such as peer-to-peer networks, or the like.
[0057] In the modalities described above, the content files 232 represent a copy of the original media content stream 231. However, in other modalities, each part of the original media content 231 can be encoded in multiple encoded representations of the same part of the content. The multiple encoded representations can be encoded according to different quality profiles and stored as separate files that are independently requestable and independently reproducible by the customer's device 204. Each file can be stored on one or more content servers 210, on the web servers, proxy caches, CDN edge caches 240 and can be ordered and delivered separately to client device 204. In one embodiment, encoder 220 simultaneously encodes original media content 231 at several different levels of quality, for example , ten or thirteen such levels. Each quality level is referred to as a quality profile or a profile. For example, if the media content is one hour in length and the media content is segmented into QSS files having a duration of two seconds, there are 1800 QSS files for each encoded representation of the media content. If the media content is encoded according to ten different quality profiles, there are 18,000 QSS files for the media content. Quality profiles can indicate how the stream should be encoded, for example, quality profiles can specify parameters, such as the width and height of the image (that is, the size of the image), the video bit rate (i.e., rate at which the video is encoded), audio bit rate, audio sample rate (that is, rate at which audio is sampled when captured), number of audio streams (e.g., mono, stereo, or similar), frame rate (for example, frames per second), test sizes, or the like. For example, media players 200 can individually request different levels of quality for the same media content 232, for example, each media player 200 can request the same portion (for example, the same time index) of media content 232 , but at different levels of quality. For example, a media player can request a streamlet with HD quality video, since the media player request computing device has sufficient computing power and sufficient network bandwidth, while another media player can request a streamlet having a lower quality, since your computing device may not have enough network bandwidth, for example. In one embodiment, the media player 200 switches between the quality levels at the borders of the parties at the request of the parties for different copies (for example, different quality streams) of the media content, as described in the patent application publication US2005 / 0262257, deposited on April 28, 2005. Alternatively, media player 200 may request the parties using other techniques that will be taken into account by those of ordinary skill in the art with the benefit of this disclosure.
[0058] The encoder 220 can also specify which quality profiles are available for the particular part of the media content and can specify how much of the media content is available for delivery, for example, using a QMX file. The QMX file indicates the current duration of the media content represented by the available QSS files. The QMX file can operate as a table of contents for the media content, indicating which QSS files are available for delivery and where the QSS files can be retrieved from. The QMX file can be sent to the media player 200 through CDN 240, for example. Alternatively, the media player 200 can request the quality profiles available for the particular media content. In other embodiments, this configuration can be extended using the scaling capabilities of CDNs to deliver HTTP traffic to multiple media players 200. For For example, a data center that stores encoded media content may have a cluster of 210 source content servers to serve multiple media players that request encoded media content from the data center. Alternatively, other configurations can be used as would be appreciated by a technician with common skills in the technique having the benefit of this disclosure.
[0059] In a contemplated modality, the media player 200 requests parts of the media content, requesting individual streamlet files (for example, the QSS files). The media player 200 requests the QSS files according to a metadata descriptor file (for example, QMX file). The media player 200 obtains a QMX file, for example, in response to a user who selects the media content for presentation, the media player 200 reads the QMX file to determine when to start playing the media content using the current duration and where to order the QSS files. The QMX file includes a QMX timestamp, such as a UTC (Coordinated Universal Time) indicator, which indicates when the encoding process started (for example, start time of media content) and a current duration that indicates how much of the content of media is available for delivery. For example, the QMX timestamp may indicate that the encoding process started at 6:00 pm (MDT) and 4,500 QSS files of the media content are available for delivery. The media player 200 can determine that the content duration (live playback) is approximately fifteen minutes and decide to start requesting QSS files corresponding to the fifteen minute program playback in the program or slightly before that point. In one embodiment, the media player 200 can determine the point in the media content at which the media player 200 must start playing the content to obtain the corresponding streamlets on the move to the media content. Each time the decoder stores another set of QSS files on the content server (for example, set of ten QSS files that represent the next two seconds of media content for the ten different quality profiles), the QMX file is updated and the QMX file can be obtained by the media player 200 to indicate that two more seconds are available for delivery over the Internet. The media player 200 can periodically check for updated QMX files. Alternatively, the QMX file and any updates can be pushed to the media player 200 to indicate when the media content is available for delivery over the Internet.
[0060] It should be noted that, although the source content server 210 has been illustrated as being within CDN 240, the source content server 210 may reside outside CDN 240 and still be associated with CDN 240. For For example, an entity can be purchased and operate the content server that stores the streamlets, but CDN 240, whose devices can be purchased and operated by one or more separate entities, delivers the streamlets.
[0061] It should be noted that the media content is data that when processed by a media player 200 (operating on an electronic device (ie client device)) allows the media player 200 to present a visual representation and / or audio from an event to a media player 200 viewer. Media player 200 can be a piece of software that plays media content (for example, displays videos and plays audio) and can be a software application standalone, a web browser plug-in, a combination of browser plug-in and web page logic, or similar. For example, the event may be a television broadcast, such as from a sporting event, a live or recorded performance, a live or recorded report, or the like. A live event or television event scheduled in this context refers to the media content that is scheduled to be played at a certain point in time, as dictated by a schedule. The live event may also have pre-recorded content mixed with live media content, such as slow motion clips of important events within the live event (for example, reruns), which are played in the middle of a live television program. alive. It should be noted that the modalities described here can also be used for video-on-demand (VOD) transmission.
[0062] Figure 3A is a schematic block diagram illustrating Another embodiment of a computing environment 300, in which an encoding system 320, including several hosts 314 each employing encoder 220, can be employed. the coding system 320 includes a master module 322 and multiple host computing modules (hereinafter “host”) 314. Each host 314 employs encoder 220, as described above with respect to Figure 2. Hosts 314 can be implemented on one or more personal computers, servers etc. In an additional embodiment, hosts 314 can be dedicated hardware, for example, cards connected to a single computer.
[0063] In one embodiment, the master module (hereinafter “master”) 322 is configured to receive raw streamlets 312 from the streamlet generation system 301, which includes a receiving module 302 that receives media content from a publisher 310 and a streamlet module 303 targets the media content in raw streamlets 312. Master module 322 tests raw streamlets 312 for processing. In another embodiment, master 322 can receive source streamlets that are encoded and / or compressed and master 322 decompresses each source streamlet to produce a raw streamlet. As used here, the term "raw streamlet" refers to a 312 streamlet that is uncompressed or slightly compressed to substantially reduce size without significant loss in quality. A lightly compressed raw streamlet can be transmitted more quickly and to more hosts. Each host 314 is coupled with master 322 and configured to receive a raw streamlet from master 322 for encoding. Hosts 314, in one example, generate multiple streamlets having identical time indices and fixed-time durations and variable bit rates. In one embodiment, each host 314 is configured to generate a set 306 of encoded streamlets from raw streamlet 312 sent from master 322, where the encoded streamlets of set 306 represent the same piece of media content at each rate supported bits (that is, each streamlet is encoded according to one of the available quality profiles). Alternatively, each host 314 can be dedicated to producing a single streamlet encoded at one of the supported bit rates, in order to reduce the time required for encoding.
[0064] Upon completion of coding, host 314 returns set 306 to master 322 so that coding system 320 can store set 306 in the streamlet database 308. Master 322 is further configured to assign tasks coding for hosts 314. In one embodiment, each host 314 is configured to present a proposal for completing a coding task (hereinafter “proposal”) to master 322. Master 322 assigns coding tasks, depending on the proposals of the hosts 314. Each host 314 generates a proposal depending on the multiple computation variables that can include, but are not limited to, percentage of current coding task completion, average task completion time, processor speed, physical memory capacity, or similar.
[0065] For example, a host 314 may submit a proposal that indicates that host 314 would be able to complete the coding task in 15 seconds based on past performance history. Master 322 is configured to select among the various proposals the best proposal and then present the coding task for host 314 with the best proposal. As such, the described coding system 320 does not require that each host 314 have identical hardware, but it does benefit from the available computational power of hosts 314. Alternatively, master 322 selects host 314 based on a first-come first order or some other. algorithm considered appropriate for a particular coding task.
[0066] The time required to encode a streamlet is dependent on the computing power of the host 314 and the encoding requirements of the content file of the original media content. Examples of encoding requirements may include, but are not limited to, encoding two or multiple passages and multiple streams of different bit rates. A benefit of the present invention is the ability to perform two-pass encoding of a live content file. Typically, in order to perform two-pass encoding, prior art systems encoding must wait for the content file to be completed before encoding. streamlets, however, can be encoded as many times as are deemed necessary. Because the streamlet is an encapsulated media object of a short duration (for example, 2 seconds), multi-pass encoding can begin in a live event once the first streamlet is captured.
[0067] In one embodiment, encoder 220 segments the original content file into source streamlets and performs two-pass encoding of multiple copies (e.g. streams) in each corresponding raw 312 streamlet without waiting for a TV show to end, for example. As such, the web server 316 is able to stream streamlets over the Internet right after the streamlet generation system 301 starts capturing the original content file. The delay between a live stream transmitted from publisher 310 and the availability of the content depends on the computing power of the hosts 314.
[0068] Figure 3B is a schematic block diagram that illustrates a modality of parallel coding of streamlets 312 according to a modality. In one example, the streamlet generation system 301 starts capturing the original content file, generates a first streamlet 312a, and passes the streamlet to the encoding system 320. The encoding system 320 can take 10 seconds, for example, to generating the first set 306a of streamlets 304a (304a1,304a2,304a3 etc. represent streamlets 304 of different bit rates). Figure 3B illustrates the encoding process generically as block 308 to graphically illustrate the length of time required to process a lightly or raw encoded streamlet 312 as described above with reference to encoding system 320. Encoding system 320 can simultaneously process more than a 312 streamlet and streamlet processing will begin after the streamlet from the streamlet generation module 301 arrives.
[0069] During the 10 seconds required to code the first streamlet 312a, the streamlet module 404 generated five additional 2-second streamlets 312b and 312c and 312d and 312e and 312f, for coding and master 322 prepared and tested the corresponding raw streamlets. Two seconds after the first set 306a is available, the next set 306b is available, and so on. As such, the original content file is encoded at different levels of quality for transfer over the Internet and appears live. The 10-second delay is given here as an example only. Several hosts 314 can be added to the coding system 320 in order to increase the processing capacity of the coding system 320. The delay can be shortened to an almost imperceptible level by the addition of high-power CPU systems or alternatively several systems of low power.
[0070] Any specific encoding scheme applied to a streamlet may take longer to complete than the duration of the streamlet itself. For example, very high quality encoding for a 2-second streamlet can take 5 seconds to complete. Alternatively, the processing time required for each streamlet may be less than the duration of a streamlet. However, because the parallel coding of displacement of successive streamlets is coded by the coding system 320 at regular intervals (coinciding the intervals in which the streamlets are presented to the coding system 320, for example, of 2 seconds), the output from the encoding system 320 does not lag behind the real-time display rate of the non-encoded streamlets 312.
[0071] Now returning to Figure 3A, as shown, master 322 and hosts 314 can be located within a single local area network, or in other terms, hosts 314 can be in physical proximity to master 322. Alternatively, hosts 314 can receive encoding tasks from master 322 over the Internet or other communications network. For example, consider a live sporting event in a remote location where it would be difficult to set up multiple hosts. In this example, a master does not perform any encoding or alternatively encodes lightly before publishing the streamlets online. Hosts 314 then retrieve the streamlets and encode the streamlets in the multiple sets of bit rates 306 as described above.
[0072] In addition, hosts 314 can be dynamically added or removed from the coding system 320 without restarting the coding task and / or interrupting the publication of streamlets. If a host 314 experiences an accident or failure, its coding task is simply reassigned to another host.
[0073] The coding system 320, in one embodiment, can also be configured to produce streamlets that are specific to a particular reproduction platform. For example, for a single raw streamlet, a single host 314 can produce streamlets of different quality levels for personal computer reproduction, streamlets for reproduction on cell phones with a different proprietary codec, a small video-only streamlet for use when reproducing only a miniature view of the stream (as in a program guide) and a very high quality streamlet for use in archiving.
[0074] In the represented mode, the computing environment 300 includes a content management system (CMS) 340. The CMS 340 is a publishing system that manages the encoded media content 220, for example, using the database of streamlet 308 and allows an editor to generate and modify timelines (referred to here as a virtual timeline (QVT)) to schedule the playback of media content 232. QVTs are metadata that can define a performance list for the viewer can indicate when media players 200 should play media content. For example, the timeline may specify a starting time for media content 232 and a current duration for media content 232 (for example, the number of available pieces of media content available for delivery) to allow the event to play media according to the schedule. In the example above, encoders 220 update CMS 240 with information about streams (for example, copies of media content 232) to indicate that certain parts (for example, streamlets) of the streamlet have been sent to the associated source content server 210 to CDN 240. In this mode, the CMS 340 receives information from encoder 220, such as, for example, any of the following: encryption keys; free / busy information indicating that encoder set 220 has sent portions of the encoded media content 232 to the source content server 210; information indicating that quality levels are available for a particular piece of media content 232; metadata, including, for example, content air date, title, actresses, actors, a start index, a final index, proprietary publisher data, encryption level, content duration, episode or program name, editor, tools available for the end user's browsing environment, such as available menus, thumbnails, sidebars, advertising, fast forward, rewind, pause and play or the like; or bit rate values, including frame size, audio channel information, codecs, sample rate, and frame analyzer information. Alternatively, encoder 220 may send more or less information than the information described above.
[0075] In the represented modality, the computing environment 300 includes a digital rights management server (DRM) 350, which provides digital rights management capacity to the system. The DRM 350 server is further configured to provide encryption keys to the end user about end user authentication. In one embodiment, the DRM 350 server is configured to authenticate a user based on login credentials. A person skilled in the art will recognize the many different ways that the DRM 350 server can authenticate an end user, including, but not limited to, encrypted cookies, user profile, geo-location, source website, etc.
[0076] In other embodiments, computing environment 300 may include other devices, such as directory servers, management servers, messaging servers, statistical servers, devices of a network infrastructure operator (for example, an ISP) or similar.
[0077] Figure 4 is a flow diagram of a modality of an audio method 400 of encoding media content according to forced frame size per codec to divide complete audio frames between the content files having parts of video fixed time period of media content. Method 400 is performed by processing logic that may include hardware (circuit, dedicated logic, or the like), software (as it runs on a general-purpose computer system or a dedicated machine), firmware (for example, embedded software ), or any combination thereof. In one embodiment, method 400 is performed by the encoder 220 of Figures 2 and 3A. In another embodiment, some of the operations of the methods can be performed by the fixed frame audio encoder 224 and the audio division multiplexer 228 in Figure 2.
[0078] In Figure 4, the processing logic begins with the initialization of the sample shift to zero (block 402) and receives a raw audio portion of the media content (block 404). The processing logic encodes the raw audio part using the fixed frame audio codec (block 406) and stores the encoded audio frames that are produced by the audio codec (block 408). The processing logic determines whether there are frames of audio enough audio to fill a streamlet (block 410). In this mode, each streamlet also includes video frames whose duration is fixed, as described here. If there are not enough audio frames to fill the streamlet, the processing logic returns to receive a later raw audio part in block 404, encodes the raw audio part and stores the encoded audio frames in block 408. When the processing determines that there are enough audio frames to fill the streamlet at block 410, the processing logic sends the audio frames to the audio division multiplexer and removes the frames sent from the buffer (block 412). The processing logic updates the sample offset (block 414) and determines whether the media content is at the end (block 416). If the media content is not at the end, at block 416, the processing logic returns to block 404 to receive Another raw audio part. Otherwise, the method ends.
[0079] As described above in relation to Figure 2, the processing logic can be configured to perform various operations of the encoder 220 components. For example, method 400 can be performed by the fixed frame audio encoder 224, which receives raw audio 233 from divider 222, encodes audio frames and stores encoded audio frames 237 in audio frame buffer 225. In this mode, operations in blocks 402-408 can be performed by the frame audio encoder fixed 224, while operations on blocks 410-416 can be performed by audio division multiplexer 228. Alternatively, operations can be performed by another combination of encoder 220 components.
[0080] Figures 5A-5C are flow diagrams of a modality of generation of content files with parts of fixed-time video in complete audio frames with frame sizes forced by codec. Methods 500, 550 and 570 are performed by processing logic that can include hardware (circuit, dedicated logic, or the like), software (as it runs on a general-purpose computer system or a dedicated machine), firmware ( for example, embedded software) or any combination thereof. In one embodiment, methods 500, 550 and 570 are performed by the encoder 220 of Figures 2 and 3A. In another embodiment, method 500 is performed by the fixed-frame audio encoder 224, method 550 is performed by the fixed-time video encoder 226 and method 570 is performed by the audio division multiplexer 228. Alternatively, the method operations 500, 550 and 570 can be performed by another combination of encoder 220 components.
[0081] In Figure 5A, the processing logic of method 500 begins by receiving a raw audio part (block 502). The processing logic encodes the raw audio part according to a forced frame size per codec (block 504) and stores the encoded audio frames (block 506). The processing logic determines whether the media content is at the end ( block 508). If the media content is not at the end in block 508, the processing logic returns to block 502 to receive another raw audio part. Otherwise, the method ends.
[0082] In Figure 5B, the processing logic of method 550 begins by receiving a raw part of video (block 552). The processing logic encodes the raw video part according to a frame rate (block 554) and accumulates the encoded video frames (block 556). The processing logic determines whether the media content is at the end (block 558) .If at block 558 the media content is not at the end, the processing logic returns to block 552 to receive another raw video. Otherwise, the method ends.
[0083] In Figure 5C, the processing logic of method 570 begins to receive encoded audio frames from the buffer (block 572) and receipt of video frames from the buffer (block 574). The processing logic generates a streamlet (block 576) and sends the streamlet to the originating content server (block 578). The processing logic determines whether the media content is at the end (block 580). If the media content is not at the end, in block 580, the processing logic returns to block 572. Otherwise, the method ends.
[0084] In one embodiment, the processing logic in block 576 determines how many video frames are needed to fill the streamlet and how many audio frames are needed to fill the streamlet. In one mode, the number of video frames for each streamlet it is approximately fixed according to the fixed time duration. For example, if the frame rate is 30 fps, then there will be 60 frames in a two-second streamlet. It should be noted, however, that in reality, the video is not always exactly 30 fps, but 29.97 fps. Thus, some two-second streamlets can have 59 frames, some can have 60, and some even have 61 frames. Each frame in a streamlet has a presentation time in relation to the beginning of the streamlet. Thus, if a streamlet represents 30-32 seconds, the first frame in that streamlet can have a presentation time of 6ms, instead of 0. The frame would be displayed at 30006ms from the beginning of the stream. In the case of live, if computational resources are limited and the encoder is unable to keep up with the live horizon, the encoder can drop frames to catch. Thus, some streamlets may have gaps in the video, which can be another cause of variations in the number of frames per streamlet. Alternatively, frame rates other than 30 fps can be used, such as 24 fps or the like. The number of audio frames for each streamlet is not fixed. The number of audio frames is determined by the operations described above in relation to the audio split multiplexer 228. The processing logic determines whether there are enough complete frames stored in the buffer to fill the current streamlet. If there are not enough audio frames, the processing logic receives and encodes a rear part of the audio, for example, a complete audio frame from the rear part as described here. In some cases, the duration of the audio frames in a streamlet may be longer than the duration of fixed time, and in other cases, the duration of audio frames may be less than the duration of fixed time.
[0085] Figure 7 illustrates a schematic representation of a machine in the exemplary form of a computer system 700 for the audio division. Within the computer system 700 is a set of instructions to make the machine perform any one or more of the audio splitting methodologies discussed here can be performed. In alternative modes, the machine can be connected (for example, in a network) to other machines on a LAN, an intranet, an extranet or on the Internet. The machine can operate at the capacity of a server or a client machine in a client-server network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine can be a PC, a tablet PC, an STB, a PDA, a cell phone, a web device, a server, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or other) that specifies actions to be taken by the machine. In addition, although only a single machine is illustrated, the term "machine" should also be interpreted as including any collection of machines that individually or together carry out a set (or several sets) of instructions to perform any one or more of the methodologies here discussed for audio splitting operations, such as methods 400, 500, 550 and 570 described above. In one embodiment, computer system 700 represents several components that can be implemented in encoder 220 or in encoding system 320 as described above. Alternatively, encoder 220 or encoding system 320 may include more or less components as illustrated in computer system 700.
[0086] The exemplary computer system 700 includes a processing device 702, a main memory 704 (eg, read-only memory (ROM), flash memory, dynamic random access memory (DRAM), such as synchronous DRAM (SDRAM ) or DRAM (RDRAM) etc.), a static memory 706 (for example, flash memory, static random access memory (SRAM), etc.) and a 716 data storage device, each of which communicates with each other the others through a 730 bus.
[0087] Processing device 702 represents one or more general purpose processing devices, such as a microprocessor, a central processing unit or the like. More particularly, the processing device 702 can be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word microprocessor (VLIW), or a process processor. implementing other instruction sets or implementing processors for a combination of instruction sets. Processing device 702 can also be one or more special-purpose processing devices, such as an application-specific integrated circuit (ASIC), programmable field gate arrays (FPGA), a digital signal processor (DSP), processor network, or the like. The processing device 702 is configured to perform the processing logic (for example, the audio division 726) to perform the operations and steps discussed here.
[0088] Computer system 700 may also include a network interface device 722. Computer system 700 may also include a video display unit 710 (for example, a liquid crystal display (LCD) or a tube of cathode ray (CRT)), an alphanumeric input device 712 (for example, a keyboard), a cursor control device 714 (for example, a mouse) and a signal generation device 720 (for example, a speaker ).
[0089] The data storage device 716 may include a computer-readable storage medium 724 in which one or more instruction sets (for example, audio division 726) are stored that contain any one or more of the methods or functions here described. The audio division 726 can also reside, completely or at least partially, inside the main memory 704 and / or inside the processing device 702 during execution by the computer system 700, the main memory 704 and the processing device. processing 702 also constituting the computer-readable storage medium. The audio division 726 can also be transmitted and received over a network via the network interface device 722.
[0090] Although the computer-readable storage medium 724 is shown in an exemplary embodiment to be a single medium, the term "computer-readable storage medium" must be taken to include a single medium or several media (for example, a centralized or distributed database, and / or caches and associated servers) that store the one or more sets of instructions. The term "computer-readable storage medium" should also be interpreted as including any medium that is capable of storing a set of instructions for the machine to perform and that causes the machine to perform any one or more of the methodologies of the present modalities. The term "computer-readable storage medium" should therefore be taken to include, but is not limited to, solid state memories, optical media, magnetic media, or other types of media for storing instructions. The term "computer-readable transmission medium" should be used to include any medium that is capable of transmitting a set of instructions for execution by the machine to make the machine perform any one or more of the methodologies of the present modalities.
[0091] The 732 audio splitting module, components and other features described here (for example, in relation to Figures 2 and 3A) can be implemented as discrete hardware components or integrated into the functionality of hardware components, such as ASICS , FPGAs, DSPs or similar devices. In addition, the 732 audio splitting module can be implemented as a firmware or functional circuit within hardware devices. In addition, the 732 audio splitting module can be implemented in any combination hardware device and software components.
[0092] The previous description, for the purpose of explanation, has been described with reference to specific modalities. However, the above illustrative discussions are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teachings. The modalities were chosen and described, in order to better explain the principles of the invention and its practical applications, in order to allow others skilled in the art to use the invention and various modalities with various modifications that may be suitable for use. particular contemplated.
权利要求:
Claims (18)
[0001]
1. Audio Separation Method with Frame Sizes Applied by Codec, comprising: receiving, via a computer system (700), multimedia content, including audio and video; encode, through the computing system (700), the video according to a frame rate; encode, by the computing system (700), the audio according to a frame size applied by codec; generate, by the computing system (700), a plurality of content files (232), characterized in that each of the plurality of content files (232) comprises a coded part of the video having a fixed time duration and a part of encoded audio having a plurality of full audio frames having frame size applied per codec, wherein a duration of the encoded portion of the audio from one or more of the plurality of content files (232) is greater or less than the length of time fixed.
[0002]
2. Audio Separation Method with Frame Sizes Applied by Codec, according to Claim 1, characterized in that the last of the plurality of complete audio frames is not filled with zeros; and / or further comprising separating the media (231) into audio (233) and video (235) content, wherein said video encoding comprises encoding the video using a video codec (226) according to the length of time fixed and wherein said audio encoding comprises encoding the audio using an audio codec according to the frame size applied per codec.
[0003]
3. Audio Separation Method with Frame Sizes Applied by Codec, according to Claim 1, further comprising: buffering the encoded frames of the audio (225); determine a number of encoded frames necessary to fill the current plurality of content files (232), characterized in that the number of frames is the smallest integer that is not less than the number of samples needed to fill the current plurality of files divided by the size of frames applied by codec; determining if there are enough buffered encoded frames to fill a current one with the plurality of content files (232); if there are enough buffered encoded frames, fill the current of the plurality of content files (232) with the number of frames and if there is not enough buffered encoded frames, buffer an additional audio frame and fill a current one plurality of content files (232) with the number of frames and the additional frame.
[0004]
4. Audio Separation Method with Frame Sizes Applied by Codec, according to Claim 3, characterized in that said determination of whether there are enough buffered encoded frames comprises: multiplying the number of buffered frames by the size of frames applied per codec ; add a sample offset, if any, from a previous one of the plurality of content files (232) to the multiplication product; determining whether the sum is greater than or equal to the number of samples needed to fill the first of the plurality of content files (232); and / or further comprises determining a sample deviation, if any, for a subsequent one of the plurality of content files (232).
[0005]
5. Audio Separation Method with Applied Frame Sizes per Codec, according to Claim 4, characterized in that said determination of the sample deviation comprises multiplying the number of encoded frames by the size of frames applied per codec minus the number of samples needed to fill the first of the plurality of content files (232) plus the sample deviation, if any, from an earlier of the plurality of content files (232).
[0006]
6. Audio Separation Method with Frame Sizes Applied by Codec, according to Claim 1, further comprising buffering the encoded frames of the audio (225) and characterized in that said generation of the plurality of content files (232 ) comprises: calculating the number of samples needed to fill a current one with the plurality of content files (232); calculate a number of frames needed for the current plurality of content files (232); add a frame to the number of frames, when the number of samples divided by the size of frames applied by codec is not equally divisible; and filling the current of the plurality of content files (232) with the number of frames.
[0007]
7. Audio Separation Method with Frame Sizes Applied by Codec, according to Claim 1, further comprising buffering the encoded frames of the audio (225) and characterized in that said generation of the plurality of content files (232 ) comprises: calculating a number of samples needed to fill a current content file (232) from one or more of the plurality of content files (232) by multiplying a sample rate by the fixed time duration plus a sample deviation, if any , from a previous one of the plurality of content files (232); calculate a number of frames needed to fill the current content file (232) by dividing the number of samples by the frame size applied by codec and if the rest of the division is zero, fill the current content file (232) with the number of frames and if the rest of the division is greater than zero, increment the number of frames by one and fill the current content file (232) with the incremented number of frames.
[0008]
8. Audio Separation Method with Frame Sizes Applied by Codec, according to Claim 7, characterized in that said generation of the plurality of content files (232) further comprises: multiplying the number of frames by the applied frame size per codec to convert back to the number of samples needed to fill the current content file (232); calculate the duration of the encoded audio portion of the current content file (232) by dividing the number of samples by the sample rate; determine a presentation deviation for a subsequent one from the plurality of content files (232) by subtracting the duration from the fixed time duration and update a sample deviation to the subsequent one from the plurality of content files (232) by multiplying the number of frames by the size of frames applied per codec minus the number of samples needed to fill the first of the plurality of content files (232) and additionally the sample deviation, if any, from the previous one of the plurality of content files (232 ).
[0009]
9. Audio Separation Method with Frame Sizes Applied by Codec, according to Claim 1, characterized in that said receipt comprises receiving the media content (231) as a plurality of raw streams, each of which of the plurality crude small stream comprises a portion of the media content (231) having a fixed time duration; and / or the said receipt of the media content (231) comprises: receiving a first of the plurality of small gross streams and a second of the plurality of small gross streams; and dividing the audio (233) and video (235) of the first small gross stream and the second small gross stream; said video encoding comprises: encoding the video of the first small gross stream, in which the video of the first small gross stream is stored in a first of the plurality of content files (232) and encoding the video of the second small gross stream, in which the video of the second small raw stream is stored in a second of the plurality of content files (232); said audio coding comprises: encoding the audio of the first small raw stream into a first plurality of audio frames; buffer the first plurality of audio frames; determine if there are enough buffered frames to fill the first content file (232); when there are not enough buffered frames to fill the first content file (232), encode the audio of the second small raw stream into a second plurality of audio frames and buffer the second plurality of audio frames; when there is enough buffered frames volume to fill the content file (232), store the buffered audio frames in the first content file (232).
[0010]
10. Audio Separation Method with Frame Sizes Applied by Codec, according to Claim 1, characterized in that said fixed time duration is two seconds, in which the audio is sampled at 48,000 samples per second, in which the frame size applied per codec is 1,024 samples per frame, in which the audio parts of the first three of the plurality of content files (232) each comprise ninety-four audio frames and in which the audio part of a quarter of the plurality of content files (232) comprises ninety-three audio frames and each of the video parts of the four content files (232) comprises sixty video frames.
[0011]
11. Audio Separation Method with Frame Sizes Applied by Codec, according to Claim 1, characterized in that said fixed time duration is two seconds, in which the audio is sampled at 44,100 samples per second, in which the frame size applied per codec is 1,024 samples per frame, where the audio portion of a first of the plurality of content files (232) comprises eighty-seven audio frames and a second of the plurality of content files (232) comprises eighty-six audio frames; and / or the frame size applied per codec is 2,048 samples per frame.
[0012]
12.Computer System, (700), comprising: means for receiving media content (231), including video and audio; means for encoding the video according to a frame rate; means for encoding audio according to a fixed frame size; means for segmenting the encoded video (239) into a plurality of parts, characterized in that each part of the encoded video (239) is stored in a separate content file (232), and means for separating the encoded audio (237) in the files separate content files (232) without introducing bypass artifacts, where the encoded audio (237) of a first content file (232) of the separate content files (232) has a duration that is longer or shorter than one duration of the encoded video portion (239) stored in the first content file (232).
[0013]
13.Computing system, (700), according to Claim 12, characterized in that it further comprises: means for tracking a sample deviation, if any, for each of the content files (232); and means for tracking a presentation deviation, if any, for each of the content files (232).
[0014]
14. Computing device, comprising: a splitter (222) for receiving media content (231), including audio and video and for sharing audio and video; a video encoder (220) coupled to receive the video from the splitter (222) and to encode the video according to a frame rate; an audio encoder (220) coupled to receive the audio from the splitter (222) and to encode the audio according to a frame size applied by codec and an audio separation multiplexer (228) to generate a plurality of files content (232), characterized in that each of the plurality of content files (232) comprises an encoded part of the video having a fixed time duration and an encoded part of the audio having a plurality of complete audio frames having the size of codec applied frames, in which the duration of the encoded part of the audio from one or more of the plurality of content files (232) is greater or less than the fixed time duration.
[0015]
15. Computing device according to Claim 14, characterized in that the last of the plurality of complete audio frames is not padded with zeros.
[0016]
16. Computing device according to Claim 14, characterized in that the computing device further comprises an audio frame buffer (225) for buffering the encoded frames of audio (225).
[0017]
17. Media, computer readable non-transitory storage, characterized by comprising the method as defined in Claim 1, performing the following steps: receiving media content (231), including audio and video; encode the video according to a frame rate; encode the audio according to a frame size applied per codec; generating a plurality of content files (232), each of the plurality of content files (232) comprising an encoded part of the video having a fixed time duration and an encoded part of the audio having a plurality of audio frames complete having the frame size applied per codec, where a duration of the encoded portion of the audio from one or more of the plurality of content files (232) is greater or less than the fixed time duration.
[0018]
18. Media, according to Claim 17, the method further comprising: buffering encoded frames of audio (225); determine a number of encoded frames needed to fill the current plurality of content files (232), characterized in that the number of frames is the smallest integer that is not less than a number of samples needed to fill the current plurality of files divided by the size of frames applied by codec; determine if there are enough of the buffered encoded frames to fill a current one with the plurality of content files (232); if there are enough of the buffered encoded frames, fill the current of the plurality of content files (232) with the number of frames; if there is not a sufficient number of encoded frames buffered, buffer an additional audio frame and fill the current plurality of content files (232) with the number of frames and the additional frame.
类似技术:
公开号 | 公开日 | 专利标题
BR112012014872B1|2020-12-15|COMPUTER SYSTEM AND DEVICE AND AUDIO SEPARATION METHOD WITH TABLE SIZES APPLIED BY CODEC AND LEGIBLE NON-TRANSITIONAL STORAGE MEDIA ON COMPUTER
US20180131744A1|2018-05-10|Methods and apparatus for reducing latency shift in switching between distinct content streams
CA2923168C|2018-03-27|Averting ad skipping in adaptive bit rate systems
US9332051B2|2016-05-03|Media manifest file generation for adaptive streaming cost management
US8943215B2|2015-01-27|Distributed smooth streaming utilizing dynamic manifests
CN107743708B|2020-06-16|Method, apparatus and node for participating in an ABR streaming session via DASH
TWI596926B|2017-08-21|Peer-assisted video distribution
JP6385447B2|2018-09-05|Video providing method and video providing system
WO2018103696A1|2018-06-14|Media file playback method, server, client, and system
US11006168B2|2021-05-11|Synchronizing internet | video streams for simultaneous feedback
CN111316652A|2020-06-19|Personalized content stream using aligned encoded content segments
JPWO2018142946A1|2019-11-21|Information processing apparatus and method
JP6258897B2|2018-01-10|Content acquisition device, content acquisition method, metadata distribution device, and metadata distribution method
同族专利:
公开号 | 公开日
US20160240205A1|2016-08-18|
WO2011084823A1|2011-07-14|
US9601126B2|2017-03-21|
AU2010339666B2|2014-07-10|
JP5728736B2|2015-06-03|
US20190182488A1|2019-06-13|
IL220482A|2014-11-30|
US10547850B2|2020-01-28|
CN102713883A|2012-10-03|
US20180234682A1|2018-08-16|
AU2010339666A1|2012-07-19|
US9338523B2|2016-05-10|
EP2517121A1|2012-10-31|
CA2784779A1|2011-07-14|
US20110150099A1|2011-06-23|
KR20140097580A|2014-08-06|
EP2517121A4|2016-10-12|
JP2013515401A|2013-05-02|
CN106210768B|2019-04-26|
KR20120101710A|2012-09-14|
CN102713883B|2016-09-28|
US20170155910A1|2017-06-01|
CA2784779C|2015-11-24|
BR112012014872A2|2016-03-22|
MX2012007243A|2013-01-29|
US10230958B2|2019-03-12|
CN106210768A|2016-12-07|
SG181840A1|2012-07-30|
US9961349B2|2018-05-01|
EP2517121B1|2019-06-19|
KR101484900B1|2015-01-22|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

CA1186028A|1982-06-23|1985-04-23|Robert M. Arn|Method and apparatus for scrambling and unscramblingdata streams using encryption and decryption|
US5515296A|1993-11-24|1996-05-07|Intel Corporation|Scan path for encoding and decoding two-dimensional signals|
US6366614B1|1996-10-11|2002-04-02|Qualcomm Inc.|Adaptive rate control for digital video compression|
US5953506A|1996-12-17|1999-09-14|Adaptive Media Technologies|Method and apparatus that provides a scalable media delivery system|
US5913191A|1997-10-17|1999-06-15|Dolby Laboratories Licensing Corporation|Frame-based audio coding with additional filterbank to suppress aliasing artifacts at frame boundaries|
US5899969A|1997-10-17|1999-05-04|Dolby Laboratories Licensing Corporation|Frame-based audio coding with gain-control words|
US5913190A|1997-10-17|1999-06-15|Dolby Laboratories Licensing Corporation|Frame-based audio coding with video/audio data synchronization by audio sample rate conversion|
US6081299A|1998-02-20|2000-06-27|International Business Machines Corporation|Methods and systems for encoding real time multimedia data|
JP4228407B2|1998-03-04|2009-02-25|ソニー株式会社|Information recording apparatus and information recording method|
US6195680B1|1998-07-23|2001-02-27|International Business Machines Corporation|Client-based dynamic switching of streaming servers for fault-tolerance and load balancing|
US6604118B2|1998-07-31|2003-08-05|Network Appliance, Inc.|File system image transfer|
US6574591B1|1998-07-31|2003-06-03|Network Appliance, Inc.|File systems image transfer between dissimilar file systems|
US6665751B1|1999-04-17|2003-12-16|International Business Machines Corporation|Streaming media player varying a play speed from an original to a maximum allowable slowdown proportionally in accordance with a buffer state|
JP2008072718A|1999-09-20|2008-03-27|Matsushita Electric Ind Co Ltd|Coding recorder|
US6278387B1|1999-09-28|2001-08-21|Conexant Systems, Inc.|Audio encoder and decoder utilizing time scaling for variable playback|
JP4356046B2|1999-10-26|2009-11-04|日本ビクター株式会社|Encoded data recording / reproducing apparatus and encoded data reproducing apparatus|
US7523181B2|1999-11-22|2009-04-21|Akamai Technologies, Inc.|Method for determining metrics of a content delivery and global traffic management network|
US6678332B1|2000-01-04|2004-01-13|Emc Corporation|Seamless splicing of encoded MPEG video and audio|
US7240100B1|2000-04-14|2007-07-03|Akamai Technologies, Inc.|Content delivery network content server request handling mechanism with metadata framework support|
US6976090B2|2000-04-20|2005-12-13|Actona Technologies Ltd.|Differentiated content and application delivery via internet|
JP4481444B2|2000-06-30|2010-06-16|株式会社東芝|Image encoding device|
WO2002015591A1|2000-08-16|2002-02-21|Koninklijke Philips Electronics N.V.|Method of playing multimedia data|
US7574272B2|2000-10-13|2009-08-11|Eric Paul Gibbs|System and method for data transfer optimization in a portable audio device|
US6934875B2|2000-12-29|2005-08-23|International Business Machines Corporation|Connection cache for highly available TCP systems with fail over connections|
JP2004533738A|2001-03-02|2004-11-04|カセンナインコーポレイテッド|A metadata-enabled push-pull model for efficiently distributing video content over networks with low latency|
US6907081B2|2001-03-30|2005-06-14|Emc Corporation|MPEG encoder control protocol for on-line encoding and MPEG data storage|
US20020154691A1|2001-04-19|2002-10-24|Kost James F.|System and process for compression, multiplexing, and real-time low-latency playback of networked audio/video bit streams|
US20030121047A1|2001-12-20|2003-06-26|Watson Paul T.|System and method for content transmission network selection|
US6789123B2|2001-12-28|2004-09-07|Microsoft Corporation|System and method for delivery of dynamically scalable audio/video content over a network|
US7127713B2|2002-01-11|2006-10-24|Akamai Technologies, Inc.|Java application framework for use in a content delivery network |
US7634531B2|2002-01-23|2009-12-15|Ali Abdolsalehi|Interactive internet browser based media broadcast|
US20030151753A1|2002-02-08|2003-08-14|Shipeng Li|Methods and apparatuses for use in switching between streaming video bitstreams|
US7136922B2|2002-10-15|2006-11-14|Akamai Technologies, Inc.|Method and system for providing on-demand content delivery for an origin server|
US20040103444A1|2002-11-26|2004-05-27|Neal Weinberg|Point to multi-point broadcast-quality Internet video broadcasting system with synchronized, simultaneous audience viewing and zero-latency|
US7630612B2|2003-02-10|2009-12-08|At&T Intellectual Property, I, L.P.|Video stream adaptive frame rate scheme|
US7373416B2|2003-04-24|2008-05-13|Akamai Technologies, Inc.|Method and system for constraining server usage in a distributed network|
US20050108414A1|2003-11-14|2005-05-19|Taylor Thomas M.|System and method for transmitting data in computer systems using virtual streaming|
US7272782B2|2003-12-19|2007-09-18|Backweb Technologies, Inc.|System and method for providing offline web application, page, and form access in a networked environment|
WO2005104551A2|2004-04-16|2005-11-03|Modulus Video, Inc.|High definition scalable array encoding system and method|
US8868772B2|2004-04-30|2014-10-21|Echostar Technologies L.L.C.|Apparatus, system, and method for adaptive-rate shifting of streaming content|
JP2005341061A|2004-05-25|2005-12-08|Ntt Docomo Inc|Multiplexer and multiplexing method|
JPWO2006006714A1|2004-07-14|2008-05-01|セイコーエプソン株式会社|VIDEO REPRODUCTION SYNC SIGNAL GENERATION METHOD, VIDEO REPRODUCTION SYNC SIGNAL GENERATION PROGRAM, TIMING CONTROL DEVICE, VIDEO / AUDIO SYNCHRONOUS REPRODUCTION METHOD, VIDEO / AUDIO SYNCHRONOUS REPRODUCTION PROGRAM, AND VIDEO / AUDIO SYNCHRONOUS REPRODUCTION DEVICE|
US20060206246A1|2004-10-28|2006-09-14|Walker Richard C|Second national / international management and security system for responsible global resourcing through technical management to brige cultural and economic desparity|
US7509021B2|2005-06-27|2009-03-24|Streaming Networks Ltd.|Method and system for providing instant replay|
US7548727B2|2005-10-26|2009-06-16|Broadcom Corporation|Method and system for an efficient implementation of the Bluetooth® subband codec |
KR100726859B1|2005-10-31|2007-06-11|엠큐브웍스|Method for transmitting audio bit stream in form of vocoder packet|
US7720677B2|2005-11-03|2010-05-18|Coding Technologies Ab|Time warped modified transform coding of audio signals|
JP2007207328A|2006-01-31|2007-08-16|Toshiba Corp|Information storage medium, program, information reproducing method, information reproducing device, data transfer method, and data processing method|
US20070230898A1|2006-03-31|2007-10-04|Masstech Group Inc.|Random access searching with frame accuracy and without indexing within windows media video|
JP2008054159A|2006-08-28|2008-03-06|Matsushita Electric Ind Co Ltd|Video-audio multiplexing apparatus|
US9247259B2|2006-10-10|2016-01-26|Flash Networks Ltd.|Control of video compression based on file size constraint|
DE102006049154B4|2006-10-18|2009-07-09|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Coding of an information signal|
US20080219151A1|2007-03-07|2008-09-11|Nokia Corporation|System and method for using a peer to peer mechanism to repair broadcast data in wireless digital broadcast networks|
US20090198500A1|2007-08-24|2009-08-06|Qualcomm Incorporated|Temporal masking in audio coding based on spectral dynamics in frequency sub-bands|
US20090257484A1|2008-04-15|2009-10-15|Winbond Electronics Corp.|Method for audio-video encoding and apparatus for multimedia storage|
US8325800B2|2008-05-07|2012-12-04|Microsoft Corporation|Encoding streaming media as a high bit rate layer, a low bit rate layer, and one or more intermediate bit rate layers|
KR101612442B1|2008-05-13|2016-04-15|삼성전자주식회사|Method and apparatus for providing and using Content Advisory for Internet contents|US20120045052A1|2010-08-17|2012-02-23|Pedro Javier Vazquez|Theft deterrence of motion picture films employing damaged-video files|
US9860293B2|2011-03-16|2018-01-02|Electronics And Telecommunications Research Institute|Apparatus and method for providing streaming content using representations|
US9111524B2|2011-12-20|2015-08-18|Dolby International Ab|Seamless playback of successive multimedia files|
US9609340B2|2011-12-28|2017-03-28|Verizon Patent And Licensing Inc.|Just-in-timeencoding for streaming media content|
US9276989B2|2012-03-30|2016-03-01|Adobe Systems Incorporated|Buffering in HTTP streaming client|
US8904453B2|2012-06-10|2014-12-02|Apple Inc.|Systems and methods for seamlessly switching between media streams|
US9142003B2|2012-06-10|2015-09-22|Apple Inc.|Adaptive frame rate control|
US20160006946A1|2013-01-24|2016-01-07|Telesofia Medical Ltd.|System and method for flexible video construction|
CN103152611B|2013-02-18|2018-04-27|中兴通讯股份有限公司|A kind of control method and device of Streaming Media pipeline business|
US9386308B2|2013-07-16|2016-07-05|Cisco Technology, Inc.|Quality optimization with buffer and horizon constraints in adaptive streaming|
US20150207841A1|2014-01-19|2015-07-23|Fabrix Tv Ltd.|Methods and systems of storage level video fragment management|
JP6438040B2|2014-02-10|2018-12-12|ドルビー・インターナショナル・アーベー|Embed encoded audio in transport streams for perfect splicing|
CN104065977B|2014-06-06|2018-05-15|北京音之邦文化科技有限公司|The processing method and processing device of audio/video file|
WO2016009420A1|2014-07-13|2016-01-21|Ani-View Ltd|A system and methods thereof for generating a synchronized audio with an imagized video clip respective of a video clip|
CA2978835C|2015-03-09|2021-01-19|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Fragment-aligned audio coding|
US9906590B2|2015-08-20|2018-02-27|Verizon Digital Media Services Inc.|Intelligent predictive stream caching|
US11269951B2|2016-05-12|2022-03-08|Dolby International Ab|Indexing variable bit stream audio formats|
US10701415B2|2016-05-19|2020-06-30|Arris Enterprises Llc|Method and apparatus for segmenting data|
US10812558B1|2016-06-27|2020-10-20|Amazon Technologies, Inc.|Controller to synchronize encoding of streaming content|
US10652625B1|2016-06-27|2020-05-12|Amazon Technologies, Inc.|Synchronization of multiple encoders for streaming content|
US10652292B1|2016-06-28|2020-05-12|Amazon Technologies, Inc.|Synchronization of multiple encoders for streaming content|
CN106911941B|2017-03-02|2019-08-16|上海幻电信息科技有限公司|A kind of adaptive video dicing method|
WO2018195431A1|2017-04-21|2018-10-25|Zenimax Media Inc.|Systems and methods for deferred post-processes in video encoding|
CN110945849B|2017-04-21|2021-06-08|泽尼马克斯媒体公司|System and method for encoder hint based rendering and precoding load estimation|
US10567788B2|2017-04-21|2020-02-18|Zenimax Media Inc.|Systems and methods for game-generated motion vectors|
CA3082771A1|2017-04-21|2018-10-25|Zenimax Media Inc.|Systems and methods for encoder-guided adaptive-quality rendering|
DE112018002562B3|2017-04-21|2022-01-05|Zenimax Media Inc.|Systems and methods for player input motion compensation in a client-server video game by caching repetitive motion vectors|
US10652166B2|2017-06-27|2020-05-12|Cisco Technology, Inc.|Non-real time adaptive bitrate recording scheduler|
US11146608B2|2017-07-20|2021-10-12|Disney Enterprises, Inc.|Frame-accurate video seeking via web browsers|
US10997320B1|2018-01-31|2021-05-04|EMC IP Holding Company LLC|Segment-based personalized cache architecture|
CN108417219B|2018-02-22|2020-10-13|武汉大学|Audio object coding and decoding method suitable for streaming media|
US10440367B1|2018-06-04|2019-10-08|Fubotv Inc.|Systems and methods for adaptively encoding video stream|
US10674192B2|2018-06-29|2020-06-02|International Business Machines Corporation|Synchronizing multiple computers presenting common content|
KR102074240B1|2018-10-17|2020-02-06|네오컨버전스 주식회사|Method and apparatus of converting vedio|
CN109788343B|2018-11-26|2019-10-25|广州微算互联信息技术有限公司|The method and cloud handset server of html web page audio stream plays|
CN110557226A|2019-09-05|2019-12-10|北京云中融信网络科技有限公司|Audio transmission method and device|
US20210084358A1|2019-09-13|2021-03-18|Netflix, Inc.|Audio transitions when streaming audiovisual media titles|
CN110913273A|2019-11-27|2020-03-24|北京翔云颐康科技发展有限公司|Video live broadcasting method and device|
法律状态:
2017-11-28| B25D| Requested change of name of applicant approved|Owner name: DISH DIGITAL L.L.C. (US) |
2017-12-19| B25G| Requested change of headquarter approved|Owner name: DISH DIGITAL L.L.C. (US) |
2019-01-08| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2019-08-06| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2020-05-12| B06A| Notification to applicant to reply to the report for non-patentability or inadequacy of the application [chapter 6.1 patent gazette]|
2020-10-20| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2020-12-15| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 21/12/2010, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US12/643,700|2009-12-21|
US12/643,700|US9338523B2|2009-12-21|2009-12-21|Audio splitting with codec-enforced frame sizes|
PCT/US2010/061658|WO2011084823A1|2009-12-21|2010-12-21|Audio splitting with codec-enforced frame sizes|
[返回顶部]