专利摘要:
METHOD FOR PRODUCING A PLANT, METHOD FOR MODULATING THE COMPOSITION OF BIOMASS IN A PLANT, VEGETABLE CELL, TRANSGENIC PLANT, SEED PRODUCT, ISOLATED NUCLEIC ACID, METHOD OF IDENTIFYING A POLYMORPHIMO IS ASSOCIATED WITH A VARIETY OF A VARIETY VEGETABLE, METHOD TO CHANGE THE COMPOSITION OF BIOMASS IN A PLANT. Methods and materials for modulating biomass composition in plants are described. For example, nucleic acids encoding polypeptides modulating biomass composition are described as well as methods for using such nucleic acids to transform plant cells. Also described are plants having altered bomassa composition and plant products produced from plants having altered biomass composition.
公开号:BR112013010278B1
申请号:R112013010278-0
申请日:2011-10-25
公开日:2020-12-29
发明作者:Nestor Apuya
申请人:Ceres, Inc;
IPC主号:
专利说明:

RELATED ORDER REFERENCE
[0001] This claim claims the benefit of US Provisional Application No. 61 / 407,280, filed on October 27, 2010. The content of the above application is incorporated herein in its entirety by reference. INCORPORATION BY REFERENCE OF SEQUENTIAL LISTINGS
[0002] The attached file, named 11696- 0280WO1_Sequence_Listing was created on September 30, 2011, and is 2.16 KB. This file can be accessed using Microsoft Word on a computer using Windows OS. DEFINITION OF RESEARCH WITH FEDERAL SPONSORSHIP
[0003] This invention was made with government support under the funds of the USDA Biomass Research and Development Initiative 68-3A75-7-601 and 68-3A75-6-501. The government has some rights to the invention. The material in the attached sequential listing is incorporated in this order in its entirety as a reference. TECHNICAL FIELD
[0004] This document describes methods and materials involved in modulating the composition of biomass in plants. For example, this document provides plants with altered sucrose or conversion efficiency, as well as materials and methods for making plants and plant products with altered sucrose or conversion efficiency. TECHNICAL STATUS
[0005] Plants store energy from sunlight in the form of chemical bonds that make up plants. The energy stored in the plant material can be converted into forms of energy such as heat, electricity and liquid fuels, depending on the plant material used and the process applied to extract energy from it. Other processes can produce chemical intermediates from plant biomass that are useful in a variety of industrial processes, for example, lactic acid, succinic acid, etc.
[0006] Plant matter has been used for millennia by humans to generate heat through direct combustion in air. For residential and process heating purposes, this heat is generally used to generate steam, which is a more easily transportable heat source and can be used to heat homes and public areas using heat exchangers of various shapes. Steam production can also be used to move turbines, which transform thermal energy into electrical energy. These processes typically involve a simple, direct combustion process of plant matter alone, or a coke-burning process with coal or another energy source. Fuels like ethanol can be produced from plant matter by a variety of different processes. For example, sugar cane sucrose can be extracted from plant matter and directly fermented to ethanol using a microorganism, such as beer yeast. Brazil has converted a significant portion of its transport sector based on ethanol derived from sugarcane, proving that this can be done on a very wide scale and over a wide geography. As another example, corn starch can be processed using α-amylase and glycoamylase to release free glucose which is subsequently fermented to ethanol. The United States uses a significant portion of its corn crop to produce starch ethanol. While these advances are significant, the ability to increase the amount of liquid transport fuel obtained from plant matter is limited and insufficient to achieve the federal renewable energy target because only a small fraction of the solar energy captured and transformed into chemical energy in plants is converted into biofuels in these industrial processes.
[0007] Plant matter can be used for the production of cellulosic biofuels by biochemical processes using enzymes and / or microorganisms or by thermochemical processes such as Biomass for Liquids (BtL) technology using high temperatures and non-enzymatic catalysts. There are still examples of hybrid thermochemical / biochemical processes. Biochemical processes typically employ physical and chemical pretreatments, enzymes, and microorganisms to deconstruct the lignocellulosic matrix of biomass in order to release the fermentable material from cellulose, hemicellulose, and other cell wall carbohydrates, which are subsequently fermented to ethanol by a microorganism. Currently, several processing methods are being developed for the production of biofuels that employ different pretreatment strategies, enzyme cocktails, and microorganisms. Several of these processes are focused on ethanol production, but butanol and other useful molecules (for example, lactic acid, succinic acid, polyalkanoates, etc.) can still be produced in this type of process. The molecule produced in the conversion is usually defined by the microorganisms selected for fermentation.
[0008] Thermochemical processes employ very high temperatures in an atmosphere with little oxygen (ie, O2) to completely degrade the organic constituents of biomass to synthesis gas, composed primarily of molecular hydrogen (H2) and carbon monoxide (CO) gaseous. These simple molecules are then converted into more useful and valuable molecules (fuels or chemical intermediates) using a Fischer-Tropsch process or other methods usually employing a chemical catalyst of some kind. These processes are effective in the production of biofuels that are similar to hydrocarbon fuels (ie gasoline, diesel, jet fuel), although other biofuel molecules can also be produced in these types of processes (ie ethanol, butanol, kerosene ).
[0009] A variant form of thermochemical processes use pyrolysis (ie thermal degradation in the complete absence of oxygen) to partially degrade the organic constituents present in plant biomass to a chemically heterogeneous liquid bio-oil. This serves to increase the energy density of the biomass to facilitate transport to centralized processing plants while the bio-oil is processed in addition to a desired set of products.
[0010] The economic viability of biomass conversion processes is significantly impacted by the composition of plant matter and its efficiency of conversion to heat, electricity, biofuels or chemical intermediates under specific processing conditions. For biochemical processes that produce biofuels or other chemicals, the recalcitrance of the biomass cellulosic matrix is a major factor in the conversion efficiency. SUMMARY OF THE INVENTION
[0011] The present invention describes methods to alter the composition of biomass in plants and plants generated in this way. Plants with altered biomass composition are useful for agriculture, pasture, horticulture, conversion of biomass into energy, paper production, production of plant compounds, and other industries. For example, this document presents genera dedicated to energy such as Panicum virgatum L. (grass), Miscanthus x gigantus (miscanto), Sorghum sp., And Saccharum sp. (sugar cane) with altered biomass composition.
[0012] This document presents a method for the production of a plant. The method includes culturing a plant cell comprising an exogenous nucleic acid. The exogenous nucleic acid includes a regulatory region operably linked to a nucleotide sequence that encodes a polypeptide, where the HMM score of the polypeptide amino acid sequence is greater than about 65, based on the HMM of the amino acid sequences represented in one of Figures 1- 12. A plant produced from the plant cell has a difference in the composition of biomass compared to the corresponding composition of a control plant that does not comprise nucleic acid. The difference in the biomass composition of the plant may be a difference in sucrose content or conversion efficiency.
[0013] This document also presents a method for the production of a plant that includes the cultivation of a plant cell comprising an exogenous nucleic acid. The exogenous nucleic acid includes a regulatory region operably linked to a nucleotide sequence that encodes a polypeptide with a sequence identity of 80 percent or more with respect to an amino acid sequence chosen from the group consisting of SEQ ID NOs: 2, 4, 6, 7 , 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22, 24, 26, 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47 , 49, 50, 52, 53, 55, 57, 59, 61, 63, 65, 66, 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88 , 90, 92, 94, 96, 97, 99, 100, 101, 102, 104, 105, 107, 109, 111, 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132 , 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149, 151, 152, 153, 155, 157, 158, 160, 161, 162, 163, 164, 165, 167, 168 , 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185, 187, 189, 190, 192, 194, 196, 197, 199, 201, 202, 204, 205, 207, 208 , 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 251 , 252, 254, 256, 258, 260, 261, 262, 264, 266, 268, 270, 272, 274, 275, 276, 278, 280, 282, 283, 284, 285, 286, Petition 870190069500, of 7/22/2019, p. 15/217 7/182 288, 289, 290, 292, 294, 295, 296, 297, 298, 299, 300, 302, 303, 305, 306, 308, 309, 310, 311, 312, 314, 316, 317, 318, 320, 321, 323, 325, 326, 327, 328, 329, 331, 333, 335, 336, 337, 338, 339, 340, 342, 344, 346, 348, 350, 351, 353, 355, 357, 359, 360, 361, 362, 363, 364, 365, 366, 368, 370, 372, 373, 374, 375, 376, 377, 379, 380, 381, 383, 384, 386, 388, 390, 392, 393, 394, 396, 398, 400, 402, 404, 406, 407, 408, 410, 412, 413, 414, 416, 418, 419, 420, 422, 423, 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450, 451, 452, 454, 456, 458, 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, 481, 483, 485, 486, 488, 490, 492, 493, 495, 497, 498, 500, 501, 503, 504, 506, 508, 509, 510, 512, 514, 515, 516, 517, 519, 521, 522, 523, 525, 527, 529, 531, 533, 534, 535, 537, 539, 541, 543, 545, 547, 549, 551, 552, 554, 556, 557, 559, 562, 564, 565, 567, 568, 570, 572, 573, 574, 575, 576, 578, 580, 582, 584, 586, 588,589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611, 613, 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, 638, 641, 643, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 677, 679, 680, 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697, 699, 701, 702, 704, 706, 708, 710, 712, 713, 715, 716, 718, 720, 721, 723, 724, 726, 727, 729, 730, 732, 733, 735, 736, 737, 739, 740, 742, 744, 746, 747, 748, 749, 750, 751, 753, 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769, 771, 772, 774, 775, 776, 777, 778, 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800, 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 815, 816, Petition 870190069500, of 7/22/2019, p. 16/217 8/182 817, 818, 819, 820, 821, and 823. A plant produced from the plant cell has a difference in biomass composition compared to the corresponding composition of a control plant that does not comprise nucleic acid. The difference in the biomass composition of the plant may be a difference in sucrose content or conversion efficiency.
[0014] In another aspect, this document presents a method for the production of a plant that includes the cultivation of a plant cell comprising an exogenous nucleic acid, where the exogenous nucleic acid includes a regulatory region operably linked to a nucleotide sequence that encodes a polypeptide with 80 percent or more sequential identity to a nucleotide sequence chosen from the group consisting of SEQ ID NOs: 1, 3, 5, 9, 11, 13, 16, 23, 25, 27, 31, 33, 35, 38, 40, 42, 44, 46, 48, 51, 54, 56, 58, 60, 62, 64, 67, 69, 74, 76, 80, 83, 85, 87, 89, 91, 93, 95, 98, 103, 106, 108, 110, 112, 114, 116, 119, 121, 123, 125, 127, 129, 131, 134, 137, 140, 142, 144, 146, 150, 154, 156, 159, 166, 169, 174, 176, 178, 180, 183, 186, 188, 191, 193, 195, 198, 200, 203, 206, 209, 212, 214, 217, 219, 224, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 253, 255, 257, 259, 263, 265, 267, 269, 271, 273, 277, 279, 281, 287, 291, 293, 301, 304, 307, 313, 3 15, 319, 322, 324, 330, 332, 334, 341, 343, 345, 347, 349, 352, 354, 356, 358, 367, 369, 371, 378, 382, 385, 387, 389, 391, 395, 397, 399, 401, 403, 405, 409, 411, 415, 417, 421, 425, 428, 432, 435, 438, 440, 443, 445, 447, 453, 455, 457, 461, 464, 467, 469, 471, 475, 477, 482, 484, 487, 489, 491, 494, 496, 499, 502, 505, 507, 511, 513, 518, 520, 524, 526, 528, 530, 532, 536, 538, 540, Petition 870190069500, of 7/22/2019, p. 17/217 9/182 542, 544, 546, 548, 550, 553, 555, 558, 560, 561, 563, 566, 569, 571, 577, 579, 581, 583, 585, 587, 591, 593, 595, 597, 600, 604, 606, 610, 612, 614, 616, 618, 620, 623, 626, 628, 631, 633, 635, 637, 639, 640, 642, 644, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 678, 681, 683, 685, 693, 696, 698, 700, 703, 705, 707, 709, 711, 714, 717, 719, 722, 725, 728, 731, 734, 738, 741, 743, 745, 752, 754, 756, 759, 762, 770, 773, 779, 784, 787, 790, 792, 794, 797, 804, 806, 809, and 822, or a fragment thereof. A plant produced from the plant cell has a difference in the biomass composition compared to the corresponding composition of a control plant that does not comprise nucleic acid. The difference in the biomass composition of the plant may be a difference in sucrose content or conversion efficiency.
[0015] This document also presents a method for the production of a plant that includes the cultivation of a plant cell comprising an exogenous nucleic acid. Exogenous nucleic acid is effective in down-regulating an endogenous nucleic acid in the plant cell, where the endogenous nucleic acid encodes a polypeptide, and where the HMM score of the polypeptide amino acid sequence is greater than about 65, where HMM is based in the amino acid sequences represented in one of Figures 1-12.
[0016] In another aspect, this document presents a method for modulating the biomass composition in a plant. The method includes introducing an exogenous nucleic acid into a plant cell, the exogenous nucleic acid comprising a regulatory region operationally linked to a nucleotide sequence that encodes a polypeptide, where the HMM score of the polypeptide amino acid sequence is greater than about of 65, where HMM is based on the amino acid sequences represented in one of Figures 1-12, and where a plant produced from the plant cell has a difference in biomass composition compared to the corresponding composition of a control plant that does not comprise the nucleic acid. The difference in the biomass composition of the plant may be a difference in sucrose content or conversion efficiency.
[0017] A method for modulating the biomass composition in a plant is also described. The method includes introducing an exogenous nucleic acid into a plant cell, the exogenous nucleic acid comprising a regulatory region operably linked to a nucleotide sequence that encodes a polypeptide with a sequence identity of 80 percent or more relative to a sequence of amino acids chosen from the group consisting of SEQ ID NOs: 2, 4, 6, 7, 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22, 24, 26, 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47, 49, 50, 52, 53, 55, 57, 59, 61, 63, 65, 66, 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88, 90, 92, 94, 96, 97, 99, 100, 101, 102, 104, 105, 107, 109, 111, 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132, 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149, 151, 152, 153, 155, 157, 158, 160, 161, 162, 163, 164, 165, 167, 168, 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185, 187, 189, 190, 192, 194, 196, 197, 199, 201, 202, 204, 205, 207, 208, 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225, 226, 228, 230, 232, 234, 236, 238, 240, Petition 870190069500, of 7/22/2019, p. 19/217 11/182 242, 244, 246, 248, 250, 251, 252, 254, 256, 258, 260, 261, 262, 264, 266, 268, 270, 272, 274, 275, 276, 278, 280, 282, 283, 284, 285, 286, 288, 289, 290, 292, 294, 295, 296, 297, 298, 299, 300, 302, 303, 305, 306, 308, 309, 310, 311, 312, 314, 316, 317, 318, 320, 321, 323, 325, 326, 327, 328, 329, 331, 333, 335, 336, 337, 338, 339, 340, 342, 344, 346, 348, 350, 351, 353, 355, 357, 359, 360, 361, 362, 363, 364, 365, 366, 368, 370, 372, 373, 374, 375, 376, 377, 379, 380, 381, 383, 384, 386, 388, 390, 392, 393, 394, 396, 398, 400, 402, 404, 406, 407, 408, 410, 412, 413, 414, 416, 418, 419, 420, 422, 423, 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450, 451, 452, 454, 456, 458, 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, 481, 483, 485, 486, 488, 490, 492, 493, 495, 497, 498, 500, 501, 503, 504, 506, 508, 509, 510, 512, 514, 515, 516, 517, 519, 521, 522, 523, 525, 527, 529, 531, 533, 534, 535, 537, 539, 541, 543, 545, 547, 549, 551, 552, 554, 556, 557, 559, 562, 564, 565, 567, 568, 570, 572, 573, 574, 575, 576, 578, 580, 582, 584, 586, 588, 589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611, 613, 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, 638, 641, 643, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 677, 679, 680, 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697, 699, 701, 702, 704, 706, 708, 710, 712, 713, 715, 716, 718, 720, 721, 723, 724, 726, 727, 729, 730, 732, 733, 735, 736, 737, 739, 740, 742, 744, 746, 747, 748, 749, 750, 751, 753, 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769, 771, 772, 774, 775, 776, 777, 778, Petition 870190069500, of 7/22/2019, p. 20/217 12/182 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800, 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, and 823. A plant produced from the plant cell has a difference in the biomass composition compared to the corresponding composition of a control plant that does not comprises nucleic acid. The difference in the biomass composition of the plant may be a difference in sucrose content or conversion efficiency.
[0018] In the methods described here, the polypeptide can include a heavy metal-associated domain with a 60 percent or more sequential identity to residues 6 to 73 of SEQ ID NO: 562. The polypeptide can include a DNA-binding domain similar to Myb with 60 percent or more sequential identity to residues 212 to 263 of SEQ ID NO: 246. The polypeptide can include a DUF1070 domain with 60 percent or more sequential identity to residues 4-52 of SEQ ID NO: 111. The polypeptide can include a domain of the glycosyl hydrosylase 16 family and a xyloglucan endotransglycosylase (XET) domain with a 60 percent or more sequential identity to residues 39 to 224 and 246 to 292 of SEQ ID NO : 348, respectively. The polypeptide can include an Alpha-L-AF_C domain with 60 percent or more sequential identity to residues 454 to 643 of SEQ ID NO: 774 and a CBM_4_9 domain with 60 percent or more sequential identity to residues 71 to 229 of SEQ ID NO: 774. The polypeptide can include a COBRA domain with a 60 percent or more sequential identity to residues 45 to 209 of SEQ ID NO: 416. The polypeptide can include a domain of the glycosyl family transferases 8 with 60 percent or more sequential identity to residues 30 to 253 of SEQ ID NO: 2. The polypeptide can include a DUF563 domain with 60 percent or more sequential identity to residues 196 to 439 of SEQ ID NO: 157. The polypeptide can include an XG_FTase domain with 60 percent or more sequential identity to residues 72 to 574 of SEQ ID NO: 280. The polypeptide can include a domain of the family of glycosyl hydrolases 16 with sequence identity ial of 60 percent or more with residues 23 to 204 of SEQ ID NO: 641 and an XET domain with sequential identity of 60 percent or more with residues 228 to 280 of SEQ ID NO: 641. The polypeptide can include a domain of the potato I inhibitor family with a sequential identity of 60 percent or more to residues 17 to 76 of SEQ ID NO: 26.
[0019] In the methods described here, the polypeptide can be selected from the group consisting of SEQ ID NOs: 2, 4, 6, 7, 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22 , 24, 26, 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47, 49, 50, 52, 53, 55, 57, 59, 61, 63, 65, 66 , 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88, 90, 92, 94, 96, 97, 99, 100, 101, 102, 104, 105 , 107, 109, 111, 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132, 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149 , 151, 152, 153, 155, 157, 158, 160, 161, 162, 163, 164, 165, 167, 168, 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185 , 187, 189, 190, 192, 194, 196, 197, 199, 201, 202, 204, 205, 207, 208, 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225 , 226, 228, 230, 232, 234, 236, 238, 240, Petition 870190069500, of 7/22/2019, p. 22/217 14/182 242, 244, 246, 248, 250, 251, 252, 254, 256, 258, 260, 261, 262, 264, 266, 268, 270, 272, 274, 275, 276, 278, 280, 282, 283, 284, 285, 286, 288, 289, 290, 292, 294, 295, 296, 297, 298, 299, 300, 302, 303, 305, 306, 308, 309, 310, 311, 312, 314, 316, 317, 318, 320, 321, 323, 325, 326, 327, 328, 329, 331, 333, 335, 336, 337, 338, 339, 340, 342, 344, 346, 348, 350, 351, 353, 355, 357, 359, 360, 361, 362, 363, 364, 365, 366, 368, 370, 372, 373, 374, 375, 376, 377, 379, 380, 381, 383, 384, 386, 388, 390, 392, 393, 394, 396, 398, 400, 402, 404, 406, 407, 408, 410, 412, 413, 414, 416, 418, 419, 420, 422, 423, 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450, 451, 452, 454, 456, 458, 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, 481, 483, 485, 486, 488, 490, 492, 493, 495, 497, 498, 500, 501, 503, 504, 506, 508, 509, 510, 512, 514, 515, 516, 517, 519, 521, 522, 523, 525, 527, 529, 531, 533, 534, 535, 537, 539, 541, 543, 545, 547, 549, 551, 552, 554, 556, 557, 559, 562, 564, 565, 567, 568, 570, 572, 573, 574, 575, 576, 578, 580, 582, 584, 586, 588, 589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611, 613, 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, 638, 641, 643, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 677, 679, 680, 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697, 699, 701, 702, 704, 706, 708, 710, 712, 713, 715, 716, 718, 720, 721, 723, 724, 726, 727, 729, 730, 732, 733, 735, 736, 737, 739, 740, 742, 744, 746, 747, 748, 749, 750, 751, 753, 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769, 771, 772, 774, 775, 776, 777, 778, Petition 870190069500, of 7/22/2019, p. 23/217 15/182 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800, 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, and 823.
[0020] This document also presents a method for modulating the biomass composition in a plant. The method includes the introduction of an exogenous nucleic acid into a plant cell, the exogenous nucleic acid comprising a regulatory region operably linked to a nucleotide sequence with a sequential identity of 80 percent or more with respect to a nucleotide sequence chosen from the group consisting of SEQ ID NOs: 1, 3, 5, 9, 11, 13, 16, 23, 25, 27, 31, 33, 35, 38, 40, 42, 44, 46, 48, 51, 54, 56, 58 , 60, 62, 64, 67, 69, 74, 76, 80, 83, 85, 87, 89, 91, 93, 95, 98, 103, 106, 108, 110, 112, 114, 116, 119, 121 , 123, 125, 127, 129, 131, 134, 137, 140, 142, 144, 146, 150, 154, 156, 159, 166, 169, 174, 176, 178, 180, 183, 186, 188, 191 , 193, 195, 198, 200, 203, 206, 209, 212, 214, 217, 219, 224, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 253 , 255, 257, 259, 263, 265, 267, 269, 271, 273, 277, 279, 281, 287, 291, 293, 301, 304, 307, 313, 315, 319, 322, 324, 330, 332 , 334, 341, 343, 345, 347, 349, 352, 354, 356, 358, 367, 369, 371, 378, 382, 385, 387, 389 , 391, 395, 397, 399, 401, 403, 405, 409, 411, 415, 417, 421, 425, 428, 432, 435, 438, 440, 443, 445, 447, 453, 455, 457, 461 , 464, 467, 469, 471, 475, 477, 482, 484, 487, 489, 491, 494, 496, 499, 502, 505, 507, 511, 513, 518, 520, 524, 526, 528, 530 , 532, 536, 538, 540, 542, 544, 546, 548, 550, 553, 555, 558, 560, 561, 563, 566, 569, 571, 577, 579, 581, 583, 585, 587, 591 , 593, 595, 597, 600, 604, 606, 610, 612, 614, 616, 618, 620, 623, 626, 628, Petition 870190069500, of 7/22/2019, p. 24/217 16/182 631, 633, 635, 637, 639, 640, 642, 644, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 678, 681, 683, 685, 693, 696, 698, 700, 703, 705, 707, 709, 711, 714, 717, 719, 722, 725, 728, 731, 734, 738, 741, 743, 745, 752, 754, 756, 759, 762, 770, 773, 779, 784, 787, 790, 792, 794, 797, 804, 806, 809, and 822, or a fragment thereof. A plant produced from the plant cell has a difference in the biomass composition compared to the corresponding composition of a control plant that does not comprise nucleic acid. The difference in the biomass composition of the plant may be a difference in sucrose content or conversion efficiency.
[0021] In another aspect, this document presents a plant cell that includes an exogenous nucleic acid. The exogenous nucleic acid includes a regulatory region operably linked to a nucleotide sequence that encodes a polypeptide, where the HMM score of the polypeptide amino acid sequence is greater than about 65, where the HMM is based on the amino acid sequences represented in one of the Figures 1-12, and where a plant produced from the plant cell has a difference in biomass composition compared to the corresponding composition of a control plant that does not comprise nucleic acid. The difference in the biomass composition of the plant may be a difference in sucrose content or conversion efficiency.
[0022] This document also features a plant cell that includes an exogenous nucleic acid, where the exogenous nucleic acid includes a regulatory region operably linked to a nucleotide sequence that encodes a polypeptide with Petition 870190069500, of 7/22/2019, p. 25/217 17/182 sequential identity of 80 percent or more relative to an amino acid sequence chosen from the group consisting of SEQ ID NOs: 2, 4, 6, 7, 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22, 24, 26, 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47, 49, 50, 52, 53, 55, 57, 59, 61, 63, 65, 66, 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88, 90, 92, 94, 96, 97, 99, 100, 101, 102, 104, 105, 107, 109, 111, 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132, 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149, 151, 152, 153, 155, 157, 158, 160, 161, 162, 163, 164, 165, 167, 168, 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185, 187, 189, 190, 192, 194, 196, 197, 199, 201, 202, 204, 205, 207, 208, 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 251, 252, 254, 256, 258, 260, 261, 262, 264, 266, 268, 270, 272, 274, 275, 276, 278, 280, 282, 283, 284, 285, 286, 288, 289, 290, 292, 294, 295, 296, 297, 298, 299, 300, 302, 303, 305, 306, 308, 309, 310, 311, 312, 314, 316, 317, 318, 320, 321, 323, 325, 326, 327, 328, 329, 331, 333, 335, 336, 337, 338, 339, 340, 342, 344, 346, 348, 350, 351, 353, 355, 357, 359, 360, 361, 362, 363, 364, 365, 366, 368, 370, 372, 373, 374, 375, 376, 377, 379, 380, 381, 383, 384, 386, 388, 390, 392, 393, 394, 396, 398, 400, 402, 404, 406, 407, 408, 410, 412, 413, 414, 416, 418, 419, 420, 422, 423, 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450, 451, 452, 454, 456, 458, 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, 481, 483, 485, 486, 488, 490, 492, 493, 495, 497, 498, 500, 501, 503, 504, 506, 508, 509, 510, 512, 514, 515, 516, 517, 519, 521, 522, Petition 870190069500, of 7/22/2019, p. 26/217 18/182 523, 525, 527, 529, 531, 533, 534, 535, 537, 539, 541, 543, 545, 547, 549, 551, 552, 554, 556, 557, 559, 562, 564, 565, 567, 568, 570, 572, 573, 574, 575, 576, 578, 580, 582, 584, 586, 588, 589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611, 613, 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, 638, 641, 643, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 677, 679, 680, 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697, 699, 701, 702, 704, 706, 708, 710, 712, 713, 715, 716, 718, 720, 721, 723, 724, 726, 727, 729, 730, 732, 733, 735, 736, 737, 739, 740, 742, 744, 746, 747, 748, 749, 750, 751, 753, 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769, 771, 772, 774, 775, 776, 777, 778, 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800, 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, and 823, where one the plant produced from the plant cell has a difference in biomass composition compared to the corresponding composition of a control plant that does not comprise nucleic acid. The difference in the biomass composition of the plant may be a difference in sucrose content or conversion efficiency.
[0023] In another additional aspect, this document presents a plant cell that includes an exogenous nucleic acid. The exogenous nucleic acid includes a regulatory region operably linked to a nucleotide sequence with a sequential identity of 80 percent or more with respect to a nucleotide sequence chosen from the group consisting of SEQ ID NOs: 1, 3, 5, 9, 11, 13, 16, 23, 25, 27, 31, 33, 35, 38, 40, 42, 44, Petition 870190069500, of 7/22/2019, p. 27/217 19/182 46, 48, 51, 54, 56, 58, 60, 62, 64, 67, 69, 74, 76, 80, 83, 85, 87, 89, 91, 93, 95, 98, 103, 106, 108, 110, 112, 114, 116, 119, 121, 123, 125, 127, 129, 131, 134, 137, 140, 142, 144, 146, 150, 154, 156, 159, 166, 169, 174, 176, 178, 180, 183, 186, 188, 191, 193, 195, 198, 200, 203, 206, 209, 212, 214, 217, 219, 224, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 253, 255, 257, 259, 263, 265, 267, 269, 271, 273, 277, 279, 281, 287, 291, 293, 301, 304, 307, 313, 315, 319, 322, 324, 330, 332, 334, 341, 343, 345, 347, 349, 352, 354, 356, 358, 367, 369, 371, 378, 382, 385, 387, 389, 391, 395, 397, 399, 401, 403, 405, 409, 411, 415, 417, 421, 425, 428, 432, 435, 438, 440, 443, 445, 447, 453, 455, 457, 461, 464, 467, 469, 471, 475, 477, 482, 484, 487, 489, 491, 494, 496, 499, 502, 505, 507, 511, 513, 518, 520, 524, 526, 528, 530, 532, 536, 538, 540, 542, 544, 546, 548, 550, 553, 555, 558, 560, 561, 563, 566, 569, 571, 577, 579, 581, 583, 585, 587, 591, 593, 595, 5 97, 600, 604, 606, 610, 612, 614, 616, 618, 620, 623, 626, 628, 631, 633, 635, 637, 639, 640, 642, 644, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 678, 681, 683, 685, 693, 696, 698, 700, 703, 705, 707, 709, 711, 714, 717, 719, 722, 725, 728, 731, 734, 738, 741, 743, 745, 752, 754, 756, 759, 762, 770, 773, 779, 784, 787, 790, 792, 794, 797, 804, 806, 809, and 822 , or a fragment thereof, where a plant produced from the plant cell has a difference in the composition of biomass compared to the corresponding composition of a control plant that does not comprise nucleic acid. The difference in the biomass composition of the plant may be a difference in sucrose content or conversion efficiency.
[0024] This document also presents a transgenic plant comprising any of the plant cells described here. The plant may be a member of the species chosen from the group consisting of Panicum virgatum (grass), Sorghum bicolor (sorghum), Miscanthus giganteus (miscanto), Saccharum sp. (energy cane), Populus balsamifera (poplar-balm), Zea mays (corn), Glycine max (soy), Brassica napus (canola), Triticum aestivum (wheat), Gossypium hirsutum (cotton), Oryza sativa (rice), Helianthus annuus (sunflower), Medicago sativa (alfalfa), Beta vulgaris (beet), and Pennisetum glaucum (millet). A transgenic plant may include a polypeptide chosen from the group consisting of SEQ ID NOs: 2, 4, 6, 7, 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22, 24, 26, 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47, 49, 50, 52, 53, 55, 57, 59, 61, 63, 65, 66, 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88, 90, 92, 94, 96, 97, 99, 100, 101, 102, 104, 105, 107, 109, 111, 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132, 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149, 151, 152, 153, 155, 157, 158, 160, 161, 162, 163, 164, 165, 167, 168, 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185, 187, 189, 190, 192, 194, 196, 197, 199, 201, 202, 204, 205, 207, 208, 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 251, 252, 254, 256, 258, 260, 261, 262, 264, 266, 268, 270, 272, 274, 275, 276, 278, 280, 282, 283, 284, 285, 286, 288, 289, 290, 292, 294, 295, 296, 297, 298, 299, 300, 302, 303, 305, 306, 308, 309, 3 10, 311, 312, 314, 316, 317, 318, 320, 321, 323, 325, 326, 327, 328, 329, 331, 333, 335, 336, 337, 338, 339, 340, 342, 344, 346, 348, Petition 870190069500, of 7/22/2019, p. 29/217 21/182 350, 351, 353, 355, 357, 359, 360, 361, 362, 363, 364, 365, 366, 368, 370, 372, 373, 374, 375, 376, 377, 379, 380, 381, 383, 384, 386, 388, 390, 392, 393, 394, 396, 398, 400, 402, 404, 406, 407, 408, 410, 412, 413, 414, 416, 418, 419, 420, 422, 423, 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450, 451, 452, 454, 456, 458, 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, 481, 483, 485, 486, 488, 490, 492, 493, 495, 497, 498, 500, 501, 503, 504, 506, 508, 509, 510, 512, 514, 515, 516, 517, 519, 521, 522, 523, 525, 527, 529, 531, 533, 534, 535, 537, 539, 541, 543, 545, 547, 549, 551, 552, 554, 556, 557, 559, 562, 564, 565, 567, 568, 570, 572, 573, 574, 575, 576, 578, 580, 582, 584, 586, 588, 589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611, 613, 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, 638, 641, 643, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 677, 679, 680, 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697, 699, 701, 702, 704, 706, 708, 710, 712, 713, 715, 716, 718, 720, 721, 723, 724, 726, 727, 729, 730, 732, 733, 735, 736, 737, 739, 740, 742, 744, 746, 747, 748, 749, 750, 751, 753, 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769, 771, 772, 774, 775, 776, 777, 778, 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800, 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, and 823. A seed compound can include embryonic tissue from a transgenic plant described here.
[0025] This document also presents an isolated nucleic acid that includes a nucleotide sequence with sequential identity of 85% or more in relation to the nucleotide sequence at the end of SEQ ID NOs: 9, 13, 16, 23, 166, 169, 186, 198, 212, 219, 229, 231, 235, 265, 267, 269, 287, 307, 313, 322, 324, 330, 332, 334, 341, 343, 354, 356, 385, 387, 389, 395, 401, 411, 542, 550, 553, 558, 571, 579, 585, 591, 593, 597, 600, 606, 614, 618, 623, 628, 631, 635, or 637.
[0026] In another aspect, an isolated nucleic acid is presented that includes a nucleotide sequence that encodes a polypeptide with a sequence identity of 80% or more in relation to the amino acid sequence at the end of SEQ ID NOs: 8, 10, 14, 15 , 17, 21, 22, 24, 57, 167, 170, 187, 213, 220, 230, 232, 236, 266, 268, 270, 285, 286, 288, 290, 295, 296, 297, 299, 308 , 309, 310, 311, 314, 317, 318, 323, 325, 327, 329, 331, 333, 335, 338, 342, 344, 355, 357, 360, 362, 363, 364, 366, 374, 377 , 381, 386, 388, 390, 392, 393, 394, 396, 402, 408, 412, 413, 414, 493, 543, 551, 554, 557, 559, 572, 573, 574, 575, 586, 589 , 590, 592, 594, 598, 599, 601, 602, 603, 607, 609, 615, 619, 622, 624, 625, 629, 630, 632, 636, 638, 776, 814, 815, 816, 817 , 818, 819, 820, or 821.
[0027] This document also presents a method to identify whether or not a polymorphism is associated with variation in a trace. The method includes determining whether one or more genetic polymorphisms in a plant population are associated with the locus of a polypeptide selected from the group consisting of the polypeptides represented in Figures 1-12 and functional homologues thereof; and to measure the correlation between the variation in one trait in the population plants and the presence of one or more genetic polymorphisms in the population plants, thus identifying whether or not the one or more polymorphisms are associated with the variation in the trait. The variation in the biomass composition can be a variation in the sucrose content or in the conversion efficiency. The population may be a population of grasses.
[0028] In another aspect, this document presents a method for the creation of a vegetal lineage. The method includes determining whether one or more genetic polymorphisms in a plant population are associated with the locus of a polypeptide selected from the group consisting of the polypeptides represented in Figures 1-12 and functional homologues thereof; identify one or more plants in the population in which the presence of at least one of the genetic polymorphisms is associated with a variation in the composition of the biomass; cross one or more of the plants identified with itself or with a different plant to produce seeds; crossing at least one plant grown progenetically from the seed with itself or with a different plant; and repeat the crossing steps for an additional 0-5 generations to form the plant line, where at least one of the genetic polymorphisms is present in the plant line. The variation in the biomass composition can be a variation in the sucrose content or in the conversion efficiency. The population may be a population of grasses.
[0029] This document also presents a method to change the biomass composition in a plant. The method includes Petition 870190069500, of 7/22/2019, p. 32/217 24/182 modification of a nucleic acid that modulates biomass composition, the nucleic acid comprising a nucleotide sequence with an open reading frame with a sequential identity of 80 percent or more (for example, 90 percent or more , or 95 percent or more) in relation to the nucleotide sequence selected from the group consisting of SEQ ID NOs: 1, 3, 5, 9, 11, 13, 16, 23, 25, 27, 31, 33, 35, 38, 40, 42, 44, 46, 48, 51, 54, 56, 58, 60, 62, 64, 67, 69, 74, 76, 80, 83, 85, 87, 89, 91, 93, 95, 98, 103, 106, 108, 110, 112, 114, 116, 119, 121, 123, 125, 127, 129, 131, 134, 137, 140, 142, 144, 146, 150, 154, 156, 159, 166, 169, 174, 176, 178, 180, 183, 186, 188, 191, 193, 195, 198, 200, 203, 206, 209, 212, 214, 217, 219, 224, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 253, 255, 257, 259, 263, 265, 267, 269, 271, 273, 277, 279, 281, 287, 291, 293, 301, 304, 307, 313, 315, 319, 322, 324, 330, 332, 334, 341, 343, 345, 347, 349, 352, 354, 356, 358, 367, 369, 371, 378, 382, 385, 387, 389, 391, 395, 397, 399, 401, 403, 405, 409, 411, 415, 417, 421, 425, 428, 432, 435, 438, 440, 443, 445, 447, 453, 455, 457, 461, 464, 467, 469, 471, 475, 477, 482, 484, 487, 489, 491, 494, 496, 499, 502, 505, 507, 511, 513, 518, 520, 524, 526, 528, 530, 532, 536, 538, 540, 542, 544, 546, 548, 550, 553, 555, 558, 560, 561, 563, 566, 569, 571, 577, 579, 581, 583, 585, 587, 591, 593, 595, 597, 600, 604, 606, 610, 612, 614, 616, 618, 620, 623, 626, 628, 631, 633, 635, 637, 639, 640, 642, 644, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 678, 681, 683, 685, 693, 696, 698, 700, 703, 705, 707, 709, 711, 714, 717, 719, 722, 725, 728, 731, 734, 738, 741, 743, 745, 752, 754, 756, 759, 762, 770, 773, 779, 784, 787, Petition 870190069500, of 7/22/2019, p. 33/217 25/182 790, 792, 794, 797, 804, 806, 809, and 822, where the plant has a difference in the biomass composition compared to the corresponding composition of a control plant where the nucleic acid has not been modified. The modification can be done by introducing a genetic modification at the locus comprising the nucleic acid. The method may also include the selection of plants with altered biomass composition. Endogenous nucleic acid can encode a polypeptide with a sequence identity of 80 percent or more (for example, 90 percent or more, or 95 percent or more) relative to an amino acid sequence chosen from the group consisting of SEQ ID NOs: 2, 4, 6, 7, 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22, 24, 26, 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47, 49, 50, 52, 53, 55, 57, 59, 61, 63, 65, 66, 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88, 90, 92, 94, 96, 97, 99, 100, 101, 102, 104, 105, 107, 109, 111, 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132, 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149, 151, 152, 153, 155, 157, 158, 160, 161, 162, 163, 164, 165, 167, 168, 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185, 187, 189, 190, 192, 194, 196, 197, 199, 201, 202, 204, 205, 207, 208, 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 251, 252, 254, 256, 258, 260, 261, 262, 264 , 266, 268, 270, 272, 274, 275, 276, 278, 280, 282, 283, 284, 285, 286, 288, 289, 290, 292, 294, 295, 296, 297, 298, 299, 300 , 302, 303, 305, 306, 308, 309, 310, 311, 312, 314, 316, 317, 318, 320, 321, 323, 325, 326, 327, 328, 329, 331, 333, 335, 336 , 337, 338, 339, 340, 342, 344, 346, 348, 350, 351, 353, 355, 357, 359, 360, 361, 362, 363, Petition 870190069500, of 07/22/2019, p. 34/217 26/182 364, 365, 366, 368, 370, 372, 373, 374, 375, 376, 377, 379, 380, 381, 383, 384, 386, 388, 390, 392, 393, 394, 396, 398, 400, 402, 404, 406, 407, 408, 410, 412, 413, 414, 416, 418, 419, 420, 422, 423, 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450, 451, 452, 454, 456, 458, 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, 481, 483, 485, 486, 488, 490, 492, 493, 495, 497, 498, 500, 501, 503, 504, 506, 508, 509, 510, 512, 514, 515, 516, 517, 519, 521, 522, 523, 525, 527, 529, 531, 533, 534, 535, 537, 539, 541, 543, 545, 547, 549, 551, 552, 554, 556, 557, 559, 562, 564, 565, 567, 568, 570, 572, 573, 574, 575, 576, 578, 580, 582, 584, 586, 588, 589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611, 613, 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, 638, 641, 643, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 677, 679, 680, 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697, 699, 701, 702, 704, 706, 708, 710, 712, 713, 715, 716, 718, 720, 721, 723, 724, 726, 727, 729, 730, 732, 733, 735, 736, 737, 739, 740, 742, 744, 746, 747, 748, 749, 750, 751, 753, 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769, 771, 772, 774, 775, 776, 777, 778, 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800, 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, and 823.
[0030] This document also presents a method for the production of a plant. The method includes culturing a plant cell containing a modified endogenous nucleic acid that encodes a polypeptide, where the HMM score of the polypeptide amino acid sequence is greater than about 65, with HMM based on the amino acid sequences represented in one of the Figures 1-12, and where the plant has a difference in the biomass composition when compared to the corresponding composition of a control plant where the nucleic acid has not been modified.
[0031] In another aspect, this document presents a plant cell containing a modified endogenous nucleic acid that encodes a polypeptide, where the HMM score of the polypeptide amino acid sequence is greater than about 65, with HMM based on the represented amino acid sequences in one of Figures 1-12, and where the plant has a difference in biomass composition when compared to the corresponding composition of a control plant where the nucleic acid has not been modified.
[0032] In another aspect, this document presents a plant cell containing an endogenous nucleic acid modulating modified biomass composition. The nucleic acid includes a nucleotide sequence with an open reading frame with a sequential identity of 80 percent or more relative to the nucleotide sequence selected from the group consisting of SEQ ID NOs: 1, 3, 5, 9, 11, 13, 16, 23, 25, 27, 31, 33, 35, 38, 40, 42, 44, 46, 48, 51, 54, 56, 58, 60, 62, 64, 67, 69, 74, 76, 80, 83, 85, 87, 89, 91, 93, 95, 98, 103, 106, 108, 110, 112, 114, 116, 119, 121, 123, 125, 127, 129, 131, 134, 137, 140, 142, 144, 146, 150, 154, 156, 159, 166, 169, 174, 176, 178, 180, 183, 186, 188, 191, 193, 195, 198, 200, 203, 206, 209, 212, 214, 217, 219, 224, 227, 229, 231, 233, 235, Petition 870190069500, of 7/22/2019, p. 36/217 28/182 237, 239, 241, 243, 245, 247, 249, 253, 255, 257, 259, 263, 265, 267, 269, 271, 273, 277, 279, 281, 287, 291, 293, 301, 304, 307, 313, 315, 319, 322, 324, 330, 332, 334, 341, 343, 345, 347, 349, 352, 354, 356, 358, 367, 369, 371, 378, 382, 385, 387, 389, 391, 395, 397, 399, 401, 403, 405, 409, 411, 415, 417, 421, 425, 428, 432, 435, 438, 440, 443, 445, 447, 453, 455, 457, 461, 464, 467, 469, 471, 475, 477, 482, 484, 487, 489, 491, 494, 496, 499, 502, 505, 507, 511, 513, 518, 520, 524, 526, 528, 530, 532, 536, 538, 540, 542, 544, 546, 548, 550, 553, 555, 558, 560, 561, 563, 566, 569, 571, 577, 579, 581, 583, 585, 587, 591, 593, 595, 597, 600, 604, 606, 610, 612, 614, 616, 618, 620, 623, 626, 628, 631, 633, 635, 637, 639, 640, 642, 644, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 678, 681, 683, 685, 693, 696, 698, 700, 703, 705, 707, 709, 711, 714, 717, 719, 722, 725, 728, 731, 734, 738, 741, 743, 745, 752, 754, 756, 759, 762, 770, 773, 779, 784, 787, 790, 792, 794, 797, 804, 806, 809, and 822, and where the plant has a difference in the biomass composition compared to the corresponding composition of a control plant where the nucleic acid has not been modified. The difference in the biomass composition of the plant may be a difference in sucrose content or conversion efficiency.
[0033] An endogenous nucleic acid can encode a polypeptide with an 80% or greater sequential identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 2, 4, 6, 7, 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22, 24, 26, 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47, 49, 50, 52, 53, 55, 57, 59, 61, 63, 65, 66, 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88, 90, Petition 870190069500, de 07/22/2019, p. 37/217 29/182 92, 94, 96, 97, 99, 100, 101, 102, 104, 105, 107, 109, 111, 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132, 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149, 151, 152, 153, 155, 157, 158, 160, 161, 162, 163, 164, 165, 167, 168, 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185, 187, 189, 190, 192, 194, 196, 197, 199, 201, 202, 204, 205, 207, 208, 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 251, 252, 254, 256, 258, 260, 261, 262, 264, 266, 268, 270, 272, 274, 275, 276, 278, 280, 282, 283, 284, 285, 286, 288, 289, 290, 292, 294, 295, 296, 297, 298, 299, 300, 302, 303, 305, 306, 308, 309, 310, 311, 312, 314, 316, 317, 318, 320, 321, 323, 325, 326, 327, 328, 329, 331, 333, 335, 336, 337, 338, 339, 340, 342, 344, 346, 348, 350, 351, 353, 355, 357, 359, 360, 361, 362, 363, 364, 365, 366, 368, 370, 372, 373, 374, 375, 376, 377, 379, 380, 381, 383, 384, 386, 388, 390, 392, 393, 394, 396, 398, 400, 402, 404, 406, 407, 408, 410, 412, 413, 414, 416, 418, 419, 420, 422, 423, 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450, 451, 452, 454, 456, 458, 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, 481, 483, 485, 486, 488, 490, 492, 493, 495, 497, 498, 500, 501, 503, 504, 506, 508, 509, 510, 512, 514, 515, 516, 517, 519, 521, 522, 523, 525, 527, 529, 531, 533, 534, 535, 537, 539, 541, 543, 545, 547, 549, 551, 552, 554, 556, 557, 559, 562, 564, 565, 567, 568, 570, 572, 573, 574, 575, 576, 578, 580, 582, 584, 586, 588, 589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611, 613, 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, 638, 641, Petition 870190069500, of 7/22/2019, p. 38/217 30/182 643, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 677, 679, 680, 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697, 699, 701, 702, 704, 706, 708, 710, 712, 713, 715, 716, 718, 720, 721, 723, 724, 726, 727, 729, 730, 732, 733, 735, 736, 737, 739, 740, 742, 744, 746, 747, 748, 749, 750, 751, 753, 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769, 771, 772, 774, 775, 776, 777, 778, 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800, 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, and 823, and where the plant has a difference in the biomass composition compared to the corresponding composition of a control plant where the nucleic acid has not been modified. The difference in the biomass composition of the plant may be a difference in sucrose content or conversion efficiency.
[0034] This document also presents a plant cell that includes an exogenous nucleic acid, the exogenous nucleic acid encoding a polypeptide with EC activity 3.2.1.55, and where a plant produced from the plant cell has a difference in biomass composition compared to the corresponding composition of a control plant that does not comprise nucleic acid. The difference in the biomass composition of the plant may be a difference in sucrose content or conversion efficiency.
[0035] In another aspect, this document presents a method for modulating the biomass composition in a plant. The method includes introducing an exogenous nucleic acid into a plant cell, the exogenous nucleic acid encoding a polypeptide with E.C. 3.2.1.55 activity.
[0036] Unless otherwise stated, all technical and scientific terms used here have the same meaning as is commonly understood by someone ordinarily versed in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used to practice the invention, suitable methods and materials are described below. All publications, patent applications, and other references mentioned here are incorporated in their entirety by reference. In case of conflicts, this specification, including definitions, will control them. In addition, the materials, methods, and examples are illustrative only and are not intended to be limiting.
[0037] Details of one or more modalities of the invention are presented later in the attached drawings and in the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims. The word "comprising" in the claims can be replaced by "consisting essentially of" or "consisting of", according to standard practice in patent law. DESCRIPTION OF THE DRAWINGS
[0038] Figure 1 is an alignment of the CeresClone amino acid sequence: 1767521 (SEQ ID NO: 483) with homologous and / or orthologous amino acid sequences. In all the alignment figures shown here, a dash in an aligned sequence represents a gap, that is, the lack of an amino acid in that position. Substitutions for identical or conserved amino acids along the aligned sequences are identified by boxes. Figure 1 and the other alignment figures provided here were generated using the MUSCLE program version 3.52.
[0039] Figures 2A-2C are an alignment of the CeresClone amino acid sequence: 1871180 (SEQ ID NO: 562) with homologous and / or orthologous amino acid sequences.
[0040] Figures 3A-3C are an alignment of the amino acid sequence of CeresClone: 240112 (SEQ ID NO: 246) with homologous and / or orthologous amino acid sequences.
[0041] Figure 4 is an alignment of the CeresClone amino acid sequence: 1764605 (SEQ ID NO: 111) with homologous and / or orthologous amino acid sequences.
[0042] Figures 5A-5E are an alignment of the CeresClone amino acid sequence: 1776501 (SEQ ID NO: 348) with homologous and / or orthologous amino acid sequences.
[0043] Figures 6A-6I are an alignment of the CeresClone amino acid sequence: 1789981 (SEQ ID NO: 774) with homologous and / or orthologous amino acid sequences.
[0044] Figures 7A-7G are an alignment of the CeresClone amino acid sequence: 1804732 (SEQ ID NO: 416) with homologous and / or orthologous amino acid sequences.
[0045] Figures 8A-8E are an alignment of the CeresClone amino acid sequence: 1807011 (SEQ ID NO: 2) with homologous and / or orthologous amino acid sequences.
[0046] Figures 9A-9I are an alignment of the CeresClone amino acid sequence: 1888614 (SEQ ID NO: 157) with homologous and / or orthologous amino acid sequences.
[0047] Figures 10A-10G are an alignment of the CeresClone amino acid sequence: 1900192 (SEQ ID NO: 280) with homologous and / or orthologous amino acid sequences.
[0048] Figures 11A-11D are an alignment of the CeresClone amino acid sequence: 1955550 (SEQ ID NO: 641) with homologous and / or orthologous amino acid sequences.
[0049] Figure 12 is an alignment of the CeresClone amino acid sequence: 1955766 (SEQ ID NO: 26) with homologous and / or orthologous amino acid sequences. DETAILED DESCRIPTION
[0050] This document presents methods and materials related to the modulation of biomass composition (for example, sucrose content or conversion efficiency) in plants. For example, this document presents methods and materials for increasing or decreasing sucrose content and conversion efficiency in plants. In some modalities, plants may still have modulated levels of, for example, lignin, modified root architecture, modified resistance to herbicides, or modified carotenoid biosynthesis. The methods may include transforming a plant cell with a nucleic acid encoding a polypeptide modulating biomass composition, where expression of the polypeptide results in the modulated composition of the biomass. Plant cells produced using such methods can be grown to produce plants with greater or lesser sucrose content and / or conversion efficiency. Such plants can produce more ruminable pasture. Higher brix levels and / or sucrose content can result in greater palatability as a grazing vegetable. In addition, such plants, and the seeds of such plants, can be used to produce, for example, grass, miscanto, Sorghum sp., And sugarcane with greater value as substrates for producing biofuels. I. Definitions
[0051] "Affordable carbohydrate" refers to mono- and oligosaccharides released in the aqueous phase after processing a biomass substrate. The amount of carbohydrate accessible in a substrate is related to the pre-treatment and enzymatic saccharification conditions chosen for the saccharification process and to the composition and structure of the initial biomass substrate.
[0052] "Amino acid" refers to one of the twenty biologically occurring amino acids and synthetic amino acids, including optical D / L isomers.
[0053] "Ashes" refers to the inorganic material that contributes to the dry weight of the substrate. The ash content in biomass substrates can be determined using published, standardized methods such as ASTM Standard E1755.
[0054] "Biofuels" includes, but is not limited to, biodiesel, methanol, ethanol, butanol, linear alkanes (C1-C20), branched chain alkanes (C5-C26), mixed alkanes, linear alcohols (C1-C20) , branched chain alcohols (C1-C26), linear carboxylic acids (C2-C20), and branched chain carboxylic acids (C2-C26). In addition, ethers, esters and amides of the aforementioned acids and alcohols, as well as other conjugates of these chemical compounds, may be of interest. Many of these compounds can subsequently be converted by chemical reactions to other high-volume, high-volume chemical compounds.
[0055] "Biomass" refers to organic matter. Biomass includes plant material derived from herbaceous and woody energy plants, food from agriculture and pasture, agricultural plant residues and remnants, wood residues and remnants, aquatic plants, and other plant derived materials. Biomass can also include algae, yard cleaning waste, and some municipal waste. Biomass is a heterogeneous and chemically complex renewable source. The components of biomass include glycans, xylans, fermentable sugars, arabinans, sucrose, lignin, proteins, ash, extractors, ferulate, and acetate.
[0056] "Preferred cell type promoter" or "preferred tissue promoter" refer to a promoter that guides expression preferentially in a target cell or tissue type, respectively, but may also lead to some transcription into other cell types or fabrics.
[0057] "Control plant" refers to a plant that does not contain the exogenous nucleic acid present in a transgenic plant of interest, but on the other hand has a genetic history equal to or similar to that of such a transgenic plant. A suitable control plant can be a non-transgenic wild-type plant, a non-transgenic segregant from a transformation experiment, or a transgenic plant that contains an exogenous nucleic acid other than the exogenous nucleic acid of interest.
[0058] "Conversion efficiency" refers to the conversion of biomass substrate to free sugars, fermentable sugars, synthesis gas, biofuel, ethanol, heat, or energy in a laboratory, pilot, or industrial scale process. The relevant parameters of conversion efficiency depend on the type of conversion process employed (biochemical, thermochemical to biofuel, or thermochemical to heat and electricity). The near infrared (NIR) spectra of biomass samples are collected and translated by a NIR model (see below) to predict the conversion properties of the substrate (such as free sugars or accessible carbohydrate), one or more intermediate values may be used to predict substrate conversion properties (such as recalcitrant carbohydrate content), or one or more processing parameters that are influenced by substrate conversion efficiency (such as biofuel or energy yields).
[0059] The predictions of the conversion properties can be used to calculate the performance characteristics of the substrate in one or more processing methods of interest. Such performance characteristics include saccharification efficiency or sugar yield (Gly, Xil, Ara, Man, Gal), various enzymatic conditions (type, ratio, charge) for saccharification, pretreatment conditions, total or network, or energy conversion efficiency, bioenergy yield or bioenergy conversion efficiency, co-product yield or extraction / conversion efficiency, economic value of the original substrate, NOX emissions, protein co-products, or sustainability indicators.
[0060] "Domains" are groups of amino acids substantially contiguous in a polypeptide that can be used to characterize families of proteins and / or parts of proteins. Such domains have a "fingerprint" or "signature" that can comprise conserved primary sequence, secondary structure and / or three-dimensional conformation. Domains are generally correlated with specific in vitro and / or in vivo activities. A domain can be from 10 amino acids to 400 amino acids, for example, 10 to 50 amino acids, or 25 to 100 amino acids, or 35 to 65 amino acids, or 35 to 55 amino acids, or 45 to 60 amino acids, or 200 to 300 amino acids , or 300 to 400 amino acids.
[0061] "Downward regulation" refers to regulation that decreases the production of expression products (mRNA, polypeptide, or both) relative to the basal or native states.
[0062] "Exogenous" to a nucleic acid indicates that the nucleic acid is part of a recombinant nucleic acid structure, or is not in its natural environment. For example, an exogenous nucleic acid can be a sequence of one species introduced into another species, that is, a heterologous nucleic acid. Typically, such an exogenous nucleic acid is introduced into the other species via a recombinant nucleic acid structure. An exogenous nucleic acid can also be a sequence that is native to an organism and has been reintroduced into the cells of that organism. An exogenous nucleic acid that includes a native sequence can sometimes be distinguished from the sequence that normally occurs by the presence of unnatural sequences linked to the exogenous nucleic acid, for example, non-native regulatory sequences flanking a native sequence in a recombinant nucleic acid structure. In addition, stably transformed exogenous nucleic acids are typically integrated at positions other than the position where the native sequence is found. It should be noted that an exogenous nucleic acid may have been introduced into a parent and not into the cell under consideration. For example, a transgenic plant containing an exogenous nucleic acid may be the progeny of a cross between a stably transformed plant and a non-transgenic plant. Such progenies are considered to contain exogenous nucleic acid.
[0063] "Expression" refers to the process of converting genetic information of a polynucleotide into RNA through transcription, which is catalyzed by an enzyme, RNA polymerase, and into proteins, through the translation of mRNA by ribosomes.
[0064] "Glycan", "Xylan" and "Arabinan" refer to the anhydrous forms of glucose, xylose, and arabinose, which are found in cellulose carbohydrate and hemicellulose polymers. Thus, for example, "glycan" refers to a polysaccharide of D-glucose monomers linked by glycosidic bonds. The following compounds are glycans: cellulose (β-1,4-glycan), dextran (α-1,6-glycan) and starch (α-1,4- and α- 1,6-glycan).
[0065] "Hemicellulose" is a general term used to refer to cell wall polysaccharides that are not celluloses or pectins. Hemicelluloses contain repetitive monomer units of a five-carbon sugar (usually D-xylose or L-arabinose) and / or a six-carbon sugar (D-galactose, D-glucose, and D-mannose). See U.S. Patent Number 7,112,429. Hemicelluloses are typically smaller in chain size than cellulose, and are highly branched. Xylan is usually the structural chain of hemicelluloses of hardwoods and grams, and the hydrolysis of these types of biomass releases products with a high content of five-carbon sugar, xylose. Softwood hemicelluloses are most commonly gluco-galactomannans, which have a mannan structure and release mannose as the main product of hydrolysis. Hemicelluloses usually contain side groups such as acetyl groups, uronic acids and ferulates.
[0066] "Heterologous polypeptide" as used here refers to a polypeptide that is not a naturally occurring polypeptide in a cell wall, for example, a transgenic Panicum virgatum plant transformed with and expressing the coding sequence for a nitrogen-carrying polypeptide of a Zea mays plant.
[0067] "Higher calorific value" (PCS) refers to the amount of heat released by a specified amount of a fuel at an initial temperature of 25 ° C, following combustion, and the return of combustion products at a temperature of 25 ° C. PCS is also known as the gross calorific value or gross energy.
[0068] "Isolated nucleic acid" as used herein includes a naturally occurring nucleic acid, provided that one or both of the sequences that immediately flank that nucleic acid in its naturally occurring genome are removed or are absent. In this way, an isolated nucleic acid includes, without limitation, a nucleic acid that exists as a purified molecule or a nucleic acid molecule that is incorporated into a vector or virus. A nucleic acid that exists among hundreds to millions of other nucleic acids in, for example, cDNA libraries, genomic libraries, or gel plates containing a restriction digest of genomic DNA, is not considered an isolated nucleic acid.
[0069] "Lignin" refers to a polyphenolic polymeric substance in plant cells, with a complex, cross-linked, highly aromatic structure. Lignin is synthesized in plants mainly from three monolignol monomers, which can be methoxylated to varying degrees: synaphyl alcohol (C11H14O4) which is incorporated into lignin as (S) syringe units; coniferyl alcohol (C10H12O3) which is incorporated into lignin as (G) guaiacyl units; and p-coumaryl alcohol (C9H10O2) which is incorporated into the lignin as (H) p-hydroxyphenyl units. These monomers can be synthesized in lignin by extensive condensation polymerization. The lignin present in different vegetable varieties can have different mass percentages syringil: guaiacila: p-hydroxyphenyl (mass percentages S: G: H). For example, certain varieties of grasses may have lignin composed almost entirely of guaiacila (G). Lignin is a major structural constituent of plant cells in woody species.
[0070] "Modulation" of the biomass level refers to the change in the level of biomass that is observed as a result of the expression, or transcription, of an exogenous nucleic acid in a plant cell and / or plant. The change in level is measured in relation to the corresponding level in control plants.
[0071] "NIR model" refers to a series of validated mathematical equations that predict the chemical composition of a sample, based on the spectral NIR data of the sample. The term also refers to a series of mathematically validated equations that predict the conversion efficiency of saccharification of a sample, based on the spectral NIR data of the sample. In the case of saccharification conversion efficiency, a different NIR model is developed for each combination of pretreatment and enzyme (s) conditions. Spectral NIR data is typically obtained from the sample at a plurality of different wavelengths, and mathematical equations are applied to the spectral data to calculate the predicted value. The calibration equations can be derived from regression from spectroscopic data for substrate samples of the same type, for example, by multiple linear regression, partial least squares, or by analysis of neural networks.
[0072] "NOX emissions" refers to mono-nitrogen oxides (NOx), such as NO and NO2, released into the atmosphere. While oxygen and nitrogen gases do not typically react at ambient temperatures, oxygen and nitrogen gases can react at elevated temperatures to create various nitrogen oxides, including mono-nitrogen oxides. Mono-nitrogen oxides can also be produced by burning materials that include elemental nitrogen. Mono-nitrogen oxides (NOx) released into the atmosphere can react with volatile organic compounds to produce a photochemical fog. In this way, NOX emissions can be regulated by several government agencies. Sulfur oxides (SOx), specifically sulfur dioxide, are also often generated in the same processes. SOx emissions are known to contribute to acid rain.
[0073] "Nucleic acid" and "polynucleotide" are used here with the same meaning, and refer to both DNA and RNA, including cDNA, genomic DNA, synthetic DNA, and DNA or RNA containing nucleic acid analogs. A nucleic acid can be double-stranded or single-stranded (i.e., a positive or negative strand). Non-limiting examples of polynucleotides include genes, gene fragments, exons, introns, messenger RNA (mRNA), transferred RNA, ribosomal RNA, siRNA, micro-RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, nucleic acid probes and primers nucleic acids. A polynucleotide can contain unconventional or modified nucleotides.
[0074] "Operationally linked" refers to the positioning of a regulatory region and a sequence to be transcribed into a nucleic acid so that the regulatory region is effective in regulating the transcription or translation of the sequence. For example, to operationally link a coding sequence and a regulatory region, the translation start site of the translational reading region of the coding sequence is typically positioned between one and about fifty nucleotides below the regulatory region. A regulatory region can, however, be positioned about 5,000 nucleotides above the translational start site, or about 2,000 nucleotides above the translational start site.
"Polypeptide" as used herein refers to a compound of two or more amino acid subunits, amino acid analogues, or other peptidomimetics, regardless of post-translational modifications, for example, phosphorylation or glycosylation. Subunits can be linked by peptide bonds or other bonds, for example, ester or ether bonds. Polypeptides with the full chain, truncated polypeptides, point mutants, insertion mutants, union variants, chimeric proteins, and fragments thereof are encompassed by this definition.
[0076] "Progenito" includes descendants of a particular plant or vegetable lineage. The progeny of a given plant includes seeds formed in the plants of the F1, F2, F3, F4, F5, F6 and subsequent generations, or seeds formed in the plants of the BC1, BC2, BC3, and subsequent generations, or seeds formed in the plants of the generations F - 1BC1, F1BC2, F1BC3 and subsequent. The designation F1 refers to the offspring of a cross between two parents that are genetically distinct. The designations F2, F3, F4, F5 and F6 refer to subsequent generations of self or sub-pollinated progenies of an F1 plant.
[0077] "Recalcitrant carbohydrate" refers to mono- and oligosaccharides that are not released into the aqueous phase after processing a biomass substrate. This is related to the pre-treatment and enzymatic saccharification conditions chosen for the saccharification process.
[0078] "Regulatory region" refers to a nucleic acid with nucleotide sequences that influence the start and rate of transcription or translation, and the stability and / or mobility of a transcription or translation product. Regulatory regions include, without limitation, promoter sequences, enhancer sequences, response elements, protein recognition sites, inducible elements, protein binding sequences, 5 'and 3' untranslated regions (RTUs), transcriptive start sites, sequences of termination, polyadenylation sequences, introns, and combinations of these. A regulatory region typically comprises at least one core (basal) promoter. A regulatory region can also include at least one control element, such as an enhancer sequence, an upper element or an upper activation region (UAR). For example, a suitable enhancer is a cis-regulatory element (-212 to -154) from the upper region of the octopine synthase (ocs) gene. Fromm et al., The Plant Cell, 1: 977-984 (1989).
[0079] "Saccharification" refers to the hydrolysis of carbohydrates to the mono- and disaccharides that make up the polymer. For example, saccharification of xylan results in the production of xylose, the monosaccharide that makes up xylan. Saccharification occurs during the biochemical processing of biomass in biorefineries, eventually leading to the production of biofuels such as ethanol.
[0080] The "saccharification efficiency" of a substrate sample refers to the total amount of mono and disaccharides solubilized by enzymatic pretreatment / saccharification processes, divided by the maximum theoretical amount of mono and disaccharides in the biomass sample that could have been released based on compositional analysis, converted into a percentage by multiplying by 100.
[0081] "Sustainability indicators" refers to the components of the by-products of biomass processing, such as the expected composition of ash and nutrients for the soil, which can be recycled.
[0082] "Up-regulation" refers to regulation that increases the level of an expression product (mRNA, polypeptide, or both) relative to the basal or native states.
[0083] "Vector" refers to a replicon, such as a plasmid, phage, or cosmid, into which another segment of DNA can be inserted, in order to cause replication of the inserted segment. Generally, a vector is capable of replication when associated with the appropriate control elements. The term "vector" includes cloning and expression vectors, as well as viral vectors and integration vectors. An "expression vector" is a vector that includes a regulatory region. II. Polypeptides
[0084] The polypeptides described here include polypeptides modulating biomass composition. Polypeptides modulating biomass composition can be effective in modulating the composition of biomass when expressed in a plant or plant cell. Such polypeptides typically contain at least one domain indicative of a biomass composition modulating polypeptide, as described here in greater detail. Biomass composition modulating polypeptides typically also have an HMM score that is greater than 65 as described here in greater detail. In some embodiments, polypeptides modulating biomass composition have an identity greater than 80% in relation to SEQ ID NOs: 2, 4, 6, 7, 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22, 24, 26, 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47, 49, 50, 52, 53, 55, 57, 59, 61, 63, 65, 66, 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88, 90, 92, 94, 96, 97, 99, 100, 101, 102, 104, 105, 107, 109, 111, 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132, 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149, 151, 152, 153, 155, 157, 158, 160, 161, 162, 163, 164, 165, 167, 168, 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185, 187, 189, 190, 192, Petition 870190069500, of 7/22/2019, p. 55/217 47/182 194, 196, 197, 199, 201, 202, 204, 205, 207, 208, 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 251, 252, 254, 256, 258, 260, 261, 262, 264, 266, 268, 270, 272, 274, 275, 276, 278, 280, 282, 283, 284, 285, 286, 288, 289, 290, 292, 294, 295, 296, 297, 298, 299, 300, 302, 303, 305, 306, 308, 309, 310, 311, 312, 314, 316, 317, 318, 320, 321, 323, 325, 326, 327, 328, 329, 331, 333, 335, 336, 337, 338, 339, 340, 342, 344, 346, 348, 350, 351, 353, 355, 357, 359, 360, 361, 362, 363, 364, 365, 366, 368, 370, 372, 373, 374, 375, 376, 377, 379, 380, 381, 383, 384, 386, 388, 390, 392, 393, 394, 396, 398, 400, 402, 404, 406, 407, 408, 410, 412, 413, 414, 416, 418, 419, 420, 422, 423, 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450, 451, 452, 454, 456, 458, 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, 481, 483, 485, 486, 488, 490, 492, 493, 495, 497, 498, 500, 501, 503, 504, 506, 508, 509, 510, 512, 514, 515, 516, 517, 519, 521, 522, 523, 525, 527, 529, 531, 533, 534, 535, 537, 539, 541, 543, 545, 547, 549, 551, 552, 554, 556, 557, 559, 562, 564, 565, 567, 568, 570, 572, 573, 574, 575, 576, 578, 580, 582, 584, 586, 588, 589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611, 613, 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, 638, 641, 643, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 677, 679, 680, 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697, 699, 701, 702, 704, 706, 708, 710, 712, 713, 715, 716, 718, 720, 721, 723, 724, 726, 727, 729, 730, 732, 733, 735, 736, 737, 739, Petition 870190069500, of 7/22/2019, p. 56/217 48/182 740, 742, 744, 746, 747, 748, 749, 750, 751, 753, 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769, 771, 772, 774, 775, 776, 777, 778, 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800, 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, and 823, as described in more detail here. A. Indicative Domains of Polypeptides Modulators of Biomass Composition
A biomass composition modulator polypeptide may contain a methyltransferase_2 domain and a dimerization domain, which are predicted to be characteristic of a biomass composition modulator polypeptide. SEQ ID NO: 562 shows the amino acid sequence of a Panicum virgatum clone, identified here as CeresClone: 1871180 (SEQ ID NO: 561) which is predicted to encode a polypeptide containing a heavy metal-associated domain. For example, a biomass composition modulator polypeptide may comprise a heavy metal-associated domain with a sequential identity of 60 percent or more (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99 , or 100 percent) with respect to residues 6 to 73 of SEQ ID NO: 562. In some embodiments, a biomass composition modulator polypeptide may comprise a heavy metal-associated domain with a sequential identity of 60 percent or more (for example, example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to the heavy metal-associated domain of one or more of the polypeptides shown in SEQ ID NOs: 564, 565 , 567, 568, 570, 572, 573, 574, 575, 576, 578, 580, 582, 584, 586, 588, Petition 870190069500, of 7/22/2019, p. 57/217 49/182 589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611, 613, 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, and 638. The heavy metal associated domains of such sequences are shown in the Sequential Listing. The heavy metal-associated domain is characteristic of proteins that transport heavy metals, and typically contains two conserved cysteines that may be involved in metal entrapment. See, for example, Rosenzweig et al., Structure Fold Des., 7: 605-617 (1999).
[0086] A biomass composition modulator polypeptide may contain a Myb-like DNA-ligand domain, which is predicted to be characteristic of a biomass composition modulator polypeptide. A polypeptide containing such a Myb-like DNA-ligand domain may be useful, for example, to modulate sucrose content or conversion efficiency. SEQ ID NO: 246 shows the amino acid sequence of a Zea mays clone, identified here as CeresClone: 240112 (SEQ ID NO: 245) which is predicted to encode a polypeptide containing a Myb-like DNA-ligand domain. For example, a biomass composition modulator polypeptide may comprise a Myb-like DNA-ligand domain with a sequential identity of 60 percent or more (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98 , 99, or 100 percent) with residues 212 to 263 of SEQ ID NO: 246. In some embodiments, a biomass composition modulator polypeptide may comprise a Myb-like DNA-ligand domain with 60 percent sequential identity or more (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to the Myb-like DNA-ligand domain of one or more of the polypeptides shown in SEQ ID NOs: 248, 250, 251, 252, 254, 256, 258, 260, 261, 262, 264, 266, 268, 270, 272, 274, 275, 276, and 278. Myb-like DNA-ligand domains such strings are shown in the Sequential Listing. The Myb-like DNA-ligand domain is found in the Myb protein family, as well as in the SANT family of domains. See Aasland et al., Trends Biochem Sci 121: 87-88 (1996). The SANT domain family specifically recognizes the YAAC (G / T) G sequence.
[0087] A biomass composition modulator polypeptide may contain a DUF1070 domain, which is predicted to be characteristic of a biomass composition modulator polypeptide. A polypeptide containing such a DUF1070 domain can be useful, for example, to modulate sucrose content. SEQ ID NO: 111 shows the amino acid sequence of a Panicum Virgatum clone, identified here as CeresClone: 1764605 (SEQ ID NO: 110) which is predicted to encode a polypeptide containing a DUF1070 domain. For example, a biomass composition modulator polypeptide can comprise a DUF1070 domain with a 60 percent or more sequential identity (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) to residues 4 - 52 of SEQ ID NO: 111. In some embodiments, a biomass composition modulator polypeptide may comprise a DUF1070 domain with a sequential identity of 60 percent or more (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to the DUF1070 domain of one or more of the polypeptides shown in SEQ ID NOs: 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132, 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149, 151, 152, 153 and 155. The DUF1070 domain is a conserved domain found in several short plant proteins, including the arabinogalactan peptide family. See, for example, Schultz et al., Plant Cell 12: 1751-68 (2000).
A biomass composition modulator polypeptide may contain a domain of the glycosyl hydrolases family 16 and a xyloglycan endotranshydrosylase (XET) domain, which are predicted to be characteristic of a biomass composition modulator polypeptide. A polypeptide containing such a domain of the glycosyl hydrolases family 16 and such an XET domain can be useful, for example, to modulate sucrose content or conversion efficiency. SEQ ID NO: 348 shows the amino acid sequence of a Panicum Virgatum clone, identified here as CeresClone: 1776501 (SEQ ID NO: 347) which is predicted to encode a polypeptide containing a domain of the glycosyl hydrolases 16 family and a domain XET. For example, a biomass composition modulator polypeptide may comprise a domain of the glycosyl hydrolases family 16 and an XET domain with a 60 percent or more sequential identity (e.g. 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to residues 39 to 224 and 246 to 292 of SEQ ID NO: 348, respectively. In some embodiments, a biomass composition modulator polypeptide may comprise a domain of the glycosyl hydrolases 16 family and an XET domain with a 60 percent or more sequential identity (for example, 65, 70, 75, 80, 85, 90, 95 , 97, 98, 99, or 100 percent) in relation to the domain of the glycosyl hydrolases family 16 and the XET domain of one or more of the polypeptides presented in SEQ ID NOs: 350, 351, 353, 355, 357, 359, 360 , 361, 362, 363, 364, 365, 366, 368, 370, 372, 373, 374, 375, 376, 377, 379, 380, 381, 383, 384, 386, 388, 390, 392, 393, 394 , 396, 398, 400, 402, 404, 406, 407, 408, 410, 412, 413, and 414. The domain of the glycosyl hydrolases family 16 and the XET domain of such sequences are shown in the sequential listing. The proteins that belong to the family of glycosyl hydrolases 16 are O-glycosyl hydrolases that hydrolyze the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate group. Members of the glycosyl hydrolases family 16 include lichenase, xyloglycan xyloglycosyltransferase, agarase, kappa-carragenase, endo-beta-1,3-glycanase, endo-beta-1,3-1,4-glycanase, and endo-beta-galactosidase. The XET domain is found at the C end (approximately 60 residues) of plant endo-transglycosylase xyloglycans. Xyloglycan is the predominant hemicellulose in the cell walls of most dicots. With cellulose, this forms a network that strengthens the cell wall. XET catalyzes the separation of the xyloglycan chains and the connection of the newly generated reducing end to the non-reducing end of another xyloglycan chain, thereby loosening the cell wall. See, for example, Schroder et al., Planta, 204: 242-251 (1998).
A biomass composition modulator polypeptide may contain an Alpha-L-AF_C domain and a CBM_4_9 domain, which are predicted to be characteristic of a biomass composition modulator polypeptide. A polypeptide containing such an Alpha-L-AF_C domain and such CBM_4_9 domain can be useful, for example, to modulate sucrose content or conversion efficiency. SEQ ID NO: 774 shows the amino acid sequence of a Panicum Virgatum clone, identified here as CeresClone: 1789981 (SEQ ID NO: 773) which is predicted to encode a polypeptide containing an Alpha-L-AF_C domain and a domain CBM_4_9. For example, a biomass composition modulator polypeptide may comprise an Alpha-L-AF_C domain with a 60 percent or more sequential identity (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to residues 454 to 643 of SEQ ID NO: 774 and a CBM_4_9 domain with a 60 percent or more sequential identity (e.g. 65, 70, 75, 80, 85, 90, 95 , 97, 98, 99, or 100 percent) with respect to residues 71 to 229 of SEQ ID NO: 774. In some embodiments, a biomass composition modulator polypeptide may comprise an Alpha-L-AF_C domain and a CBM_4_9 domain with sequential identity of 60 percent or more (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to the Alpha-L-AF_C domain and the CBM_4_9 of one or more of the polypeptides shown in SEQ ID NOs: 775, 776, 777, 778, 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800 , 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 8 15, 816, 817, 818, 819, 820, and 821. The Alpha-L-AF_C and CBM_4_9 domains of such sequences are shown in the Sequential Listing. The Alpha-L-AF_C domain represents the approximately 200 residues of the C-terminus of bacterial and eukaryotic alpha-L-arabinofuranosity (EC: 3.2.1.55), which catalyzes the hydrolysis of non-reducing terminal alpha-L-arabinofuranoside bonds in polysaccharides containing L-arabinose. The CBM_4_9 domain is a carbohydrate binding domain.
[0090] A biomass composition modulator polypeptide may contain a COBRA domain, which is predicted to be characteristic of a biomass composition modulator polypeptide. A polypeptide containing such a COBRA domain can be useful, for example, to modulate sucrose content or conversion efficiency. SEQ ID NO: 416 shows the amino acid sequence of a Panicum Virgatum clone, identified here as CeresClone: 1804732 (SEQ ID NO: 415) which is predicted to encode a polypeptide containing a COBRA domain. For example, a biomass composition modulator polypeptide can comprise a COBRA domain with a 60 percent or more sequential identity (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) to residues 45 to 209 of SEQ ID NO: 416. In some embodiments, a biomass composition modulator polypeptide may comprise a COBRA domain with a sequential identity of 60 percent or more (e.g. 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to the COBRA domain of one or more of the polypeptides shown in SEQ ID NOs: 418, 419, 420, 422, 423, 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450, 451, 452, 454, 456, 458, 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, and 481. COBRA domains are found in a family of plant proteins called COBRA-like proteins (COBL). Family members are extracellular proteins anchored in glycosyl-phosphatidyl inositol (GPI-ligands). The COBRA domain is involved in determining the orientation of cell expansion, probably playing an important role in the deposition of cellulose. This can act by recruiting complex cellulose synthesizers to discrete positions on the cell surface. See Roudier et al., Plant Cell. 17 (6): 1749-63 (2005), Epub 2005 Apr 22.
[0091] A biomass composition modulator polypeptide may contain a domain of the glycosyl transferases family 8, which is predicted to be characteristic of a biomass composition modulator polypeptide. A polypeptide containing such a domain of the glycosyl transferases family 8 may be useful, for example, to modulate sucrose content. SEQ ID NO: 2 shows the amino acid sequence of a Panicum Virgatum clone, identified here as CeresClone: 1807011 (SEQ ID NO: 1) which is predicted to encode a polypeptide containing a domain of the glycosyl transferases family 8. For example , a biomass composition modulator polypeptide may comprise a domain of the glycosyl transferases 8 family with a sequential identity of 60 percent or more (e.g. 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to residues 30 to 253 of SEQ ID NO: 2. In some embodiments, a biomass composition modulator polypeptide may comprise a domain of the glycosyl transferases 8 family with a sequential identity of 60 percent or more (for example, example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to the glycosyl transferase family domain 8 of one or more of the polypeptides shown in SEQ ID NOs: 4, 6, 7, 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22, and 24. Family domains glycosyl transferases 8 of such sequences are shown in the Sequential Listing. The domains of the glycosyl transferases 8 family are found in a family of enzymes that transfer sugar residues to donor molecules. Members of this family include the lipopolysaccharide galactosyltransferase, lipopolysaccharide glycosyltransferase 1, glycogenin glycosyltransferase, and inositol 1-alpha-galactosyltransferase.
[0092] A biomass composition modulator polypeptide may contain a DUF563 domain, which is predicted to be characteristic of a biomass composition modulator polypeptide. A polypeptide containing such a DUF563 domain can be useful, for example, to modulate sucrose content. SEQ ID NO: 157 shows the amino acid sequence of a Panicum Virgatum clone, identified here as CeresClone: 1888614 (SEQ ID NO: 156) which is predicted to encode a polypeptide containing a DUF563 domain. For example, a biomass composition modulator polypeptide can comprise a DUF563 domain with a 60 percent or more sequential identity (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with residues 196 to 439 of SEQ ID NO: 157. In some embodiments, a biomass composition modulator polypeptide may comprise a DUF563 domain with a 60 percent or more sequential identity (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to the DUF563 domain of one or more of the polypeptides shown in SEQ ID NOs: 158, 160, 161, 162, 163, 164, 165, 167, 168, 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185, 187, 189, 190, 192, 194, 196, 197, 199, 201, 202, 204, 205, 207, 208, 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225, 226, 228, 230, 232, 234, 236, 238, 240, 242, and 244. The DUF563 domains of such sequences are shown in the Sequential Listing. Proteins with a DUF563 domain belong to the glycosyltransferase family 61.
A biomass composition modulator polypeptide may contain a xyloglycan fucosyltransferase domain (XG_FTase), which is predicted to be characteristic of a biomass composition modulator polypeptide. A polypeptide containing such an XG_FTase domain can be useful, for example, to modulate sucrose content. SEQ ID NO: 280 shows the amino acid sequence of a Panicum Virgatum clone, identified here as CeresClone: 1888614 (SEQ ID NO: 279) which is predicted to encode a polypeptide containing an XG_FTase domain. For example, a biomass composition modulator polypeptide may comprise an XG_FTase domain with a 60 percent or more sequential identity (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) to residues 72 to 574 of SEQ ID NO: 280. In some embodiments, a biomass composition modulator polypeptide may comprise an XG_FTase domain with a sequential identity of 60 percent or more (e.g. 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to the XG_FTase domain of one or more of the polypeptides shown in SEQ ID NOs: 280, 282, 283, 284, 285, 286, 288, 289, 290, 292, 294, 295, 296, 297, 298, 299, 300, 302, 303, 305, 306, 308, 309, 310, 311, 312, 314, 316, 317, 318, 320, 321, 323, 325, 326, 327, 328, 329, 331, 333, 335, 336, 337, 338, 339, 340, 342, 344, and 346. The XG_FTase domains of such sequences are shown in the Sequential Listing. The XG_FTase domain is found in a fucosyltransferase and transfers the terminal fucosyl residue to xyloglycan (XG), which is the main resistance hemicellulose of dicotyledonous plants. See, for example, Perrin et al., Science, 284: 1976-1979 (1999).
A biomass composition modulator polypeptide may contain a domain of the glycosyl hydrolases family 16 and a xyloglycan endotransglycosylase (XET) domain, which are predicted to be characteristic of a biomass composition modulator polypeptide. A polypeptide containing such a domain of the glycosyl hydrolases family 16 and such an XET domain can be useful, for example, to modulate sucrose content or conversion efficiency. SEQ ID NO: 641 shows the amino acid sequence of a Panicum Virgatum clone, identified here as CeresClone: 1955550 (SEQ ID NO: 640) which is predicted to encode a polypeptide containing a domain of the glycosyl hydrolases 16 family and a domain XET. For example, a biomass composition modulator polypeptide may comprise a domain of the glycosyl hydrolases 16 family with a sequential identity of 60 percent or more (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) for residues 23 to 204 of SEQ ID NO: 641 and an XET domain with sequential identity of 60 percent or more (for example, 65, 70, 75, 80, 85, 90, 95 , 97, 98, 99, or 100 percent) with residues 228 to 280 of SEQ ID NO: 641. In some embodiments, a biomass composition modulator polypeptide may comprise a domain of the glycosyl hydrolases family 16 and an XET domain with sequential identity of 60 percent or more (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to the glycosyl hydrolases 16 domain domain and the XET of one or more of the polypeptides shown in SEQ ID NOs: 643 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658 660, 662, 664, 666, 668, 670, 672 , 674, 676, 677, 679, 680 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697 699, 701, 702, 704, 706, 708, 710, 712, 713 , 715, 716, 718 720, 721, 723, 724, 726, 727, 729, 730, 732, 733, 735, 736 737, 739, 740, 742, 744, 746, 747, 748, 749, 750, 751 , 753 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769 771, 772, or 823. The domain of the glycosyl hydrolases 16 and the XET domain of such sequences are shown in Listing Sequential. The domain of the glycosyl hydrolases family 16 and the XET domain are described above in relation to SEQ ID NO: 348.
[0095] A biomass composition modulator polypeptide may contain a xyloglycan fucosyltransferase domain (from the potato inhibitor family I), which is predicted to be characteristic of a biomass composition modulator polypeptide. A polypeptide containing such a domain of the potato I inhibitor family may be useful, for example, to modulate sucrose content. SEQ ID NO: 26 shows the amino acid sequence of a Panicum Virgatum clone, identified here as CeresClone: 1955766 (SEQ ID NO: 25) which is predicted to encode a polypeptide containing a domain of the potato I inhibitor family. For example, a biomass composition modulator polypeptide can comprise a domain of the potato I inhibitor family with a sequential identity of 60 percent or more (e.g. 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) in relation to residues 16 to 76 of SEQ ID NO: 26. In Petition 870190069500, of 7/22/2019, p. In some embodiments, a biomass composition modulator polypeptide may comprise a domain of the potato I inhibitor family with a sequential identity of 60 percent or more (for example, 65, 70, 75, 80, 85, 90, 95, 97, 98, 99, or 100 percent) with respect to the potato I inhibitor family domain of one or more of the polypeptides shown in SEQ ID NOs: 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47, 49, 50, 52, 53, 55, 57, 59, 61, 63, 65, 66, 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88, 90, 92, 94, 96, 97, 99, 100, 101, 102, 104, 105, 107, and 109. The domains of the potato I inhibitor family of such strings are shown in the Sequential Listing. The members of the potato I inhibitor family are proteinase inhibitors that inhibit peptidases of the S1 and S8 families. See, for example, Rawlings et al., Biochem J. 15, 378 (Pt 3): 705-16 (2004). Inhibitors of this family are small (60 to 90 residues) and have no disulfide bonds. Typically, the inhibitor is a cuneiform molecule, its pointed end being formed by the protease-binding loop, which contains the cleavable link. The loop binds strongly to the active site of the protease, following the cleavage of the cleavable link causing inhibition of the enzyme. See Bode et al., EMBO J., 5 (4): 813-8 (1986).
[0096] In some embodiments, a polypeptide modulating biomass composition is truncated at the amino- or carboxy-terminal end of a naturally occurring polypeptide. A truncated polypeptide can maintain certain domains of the natural polypeptide and have no others. In this way, length variants that are about 5 shorter or longer amino acids typically exhibit the biomass composition modulating activity of a truncated polypeptide. In some embodiments, a truncated polypeptide is a dominantly negative polypeptide. The expression in a plant of such a truncated polypeptide confers a difference in the biomass composition of the plant when compared to the corresponding level in a control plant that does not understand the truncation. B. Functional Counterparts Identified by Reciprocal BLAST
[0097] In some embodiments, one or more functional homologues of a reference biomass composition modulator polypeptide defined by one or more of the Pfam descriptions indicated above are suitable for use as biomass composition modulator polypeptides. A functional homolog is a polypeptide that has sequential similarity to a reference polypeptide, and that maintains one or more of the biochemical or physiological functions of the reference polypeptide. A functional homolog and the reference polypeptide can be naturally occurring polypeptides, and sequential similarity can occur due to convergent or divergent evolutionary events. In this way, functional counterparts are sometimes referred to in the literature as counterparts, or orthologs, or parallels. Variants of a naturally occurring functional homolog, such as polypeptides encoded by mutants of a wild-type coding sequence, can be functional homologs. Functional homologues can also be created by site-directed mutagenesis of the coding sequence, or by combining coding sequence domains for different naturally occurring biomass composition modulating polypeptides ("domain exchange"). The term "functional homolog" is sometimes applied to the nucleic acid that encodes a functionally homologous polypeptide.
[0098] Functional homologues can be identified by analyzing the sequential alignments of nucleotides and polypeptides. For example, conducting a search on a database of nucleotide or polypeptide sequences can identify homologues of polypeptides modulating biomass composition. Sequential analysis can involve BLAST, reciprocal BLAST, or PSI-BLAST analyzes of non-redundant databases using the amino acid sequence of polypeptide modulators of biomass composition as the reference sequences. The amino acid sequence is, in some instances, deduced from the nucleotide sequence. Those polypeptides in the database that have a sequential identity greater than 40% are candidates for future evaluation for suitability as a polypeptide modulator of biomass composition. The similarity of the amino acid sequence allows for conservative substitutions of amino acids, such as replacing one hydrophobic residue with another, or replacing one polar residue with another. If desired, the manual inspection of candidates can be carried out in order to reduce the number of candidates to be evaluated further. Manual inspection can be carried out by selecting among the candidates those who appear to have domains present in the polypeptides modulating biomass composition, for example, conserved functional domains.
[0099] The conserved regions can be identified by the location of a region within the primary amino acid sequence of a polypeptide modulator of biomass composition which is a repeated sequence, forms some secondary structure (for example, helices and beta-leaves), establishes positively or negatively charged domains, or represents a structural pattern or protein domain. See for example the Pfam website, which describes consensus strings for a variety of structural patterns and domains on the internet, at sanger.ac.uk/Software/Pfam/ and pfam.janelia.org/. A description of the information included in the Pfam database is found in Sonnhammer et al., Nucl. Acids Res., 26: 320-322 (1998); Sonnhammer et al., Proteins, 28: 405-420 (1997); and Bateman et al., Nucl. Acids Res., 27: 260-262 (1999). Conserved regions can also be determined by aligning the sequences of the same polypeptide or related polypeptides of closely related species. The closely related species are preferably from the same family. In some embodiments, the alignment of the sequences of two different species is adequate.
[0100] Typically, polypeptides that exhibit at least about 40% sequential amino acid identity are useful for identifying conserved regions. Conserved regions of related polypeptides exhibit at least 45% sequential amino acid identity (for example, at least 50%, at least 60%, at least 70%, at least 80%, or at least 90% sequential amino acid identity ). In some embodiments, a conserved region exhibits at least 92%, 94%, 96%, 98%, or 99% sequential amino acid identity.
[0101] Examples of amino acid sequences from functional homologues of the polypeptide shown in SEQ ID NO: 483 are provided in Figure 1 and the Sequential Listing. Such functional counterparts include, for example, CeresAnnot: 8701398 (SEQ ID NO: 485), GI: 21741986 (SEQ ID NO: 486), CeresClone: 488555 (SEQ ID NO: 488), CeresAnnot: 1472210 (SEQ ID NO: 490 ), CeresClone: 1839543 (SEQ ID NO: 492), GI: 124360895 (SEQ ID NO: 493), CeresClone: 1778664 (SEQ ID NO: 495), CeresClone: 2030878 (SEQ ID NO: 497), GI: 115458882 ( SEQ ID NO: 498), CeresAnnot: 8701404 (SEQ ID NO: 500), GI: 115458830 (SEQ ID NO: 501), CeresAnnot: 8701387 (SEQ ID NO: 503), GI: 116310418 (SEQ ID NO: 504) , CeresAnnot: 8679943 (SEQ ID NO: 506), CeresAnnot: 8701391 (SEQ ID NO: 508), GI: 46806257 (SEQ ID NO: 509), GI: 125540058 (SEQ ID NO: 510), CeresClone: 1018979 (SEQ ID NO: 512), CeresClone: 1725423 (SEQ ID NO: 514), GI: 115446965 (SEQ ID NO: 515), GI: 125540059 (SEQ ID NO: 516), GI: 38606531 (SEQ ID NO: 517), CeresClone: 1955791 (SEQ ID NO: 519), CeresClone: 2032166 (SEQ ID NO: 521), GI: 125540060 (SEQ ID NO: 522), GI: 46806261 (SEQ ID NO: 523), CeresClone: 100178733 (SEQ ID NO: 525), CeresClone: 351547 (SEQ ID NO : 527), CeresClone: 1906874 (SEQ ID NO: 529), CeresClone: 273420 (SEQ ID NO: 531), CeresAnnot: 8701399 (SEQ ID NO: 533), GI: 125540061 (SEQ ID NO: 534), GI: 115446971 (SEQ ID NO: 535), CeresClone: 1802499 (SEQ ID NO: 537), CeresClone: 1850157 (SEQ ID NO: 539), CeresClone: 1471240 (SEQ ID NO: 541), CeresAnnot: 8679942 (SEQ ID NO: 543), CeresClone: 1024049 (SEQ ID NO: 545), CeresAnnot: 885518 (SEQ ID NO: 547), CeresAnnot: 871243 (SEQ ID NO: 549), CeresAnnot: 1461629 (SEQ ID NO: 551), GI: 27754556 (SEQ ID NO: 552), CeresAnnot: 8679941 (SEQ ID NO: 554), CeresClone: 1846767 (SEQ ID NO: 556), GI: 118489467 (SEQ ID NO: 557), and CeresAnnot: 1480319 (SEQ ID NO: 559). In some cases, a functional homologue of SEQ ID NO: 483 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70% , 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, relative to the amino acid sequence shown in SEQ ID NO: 483. In some cases, a functional counterpart SEQ ID NO: 483 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity with respect to one or more functional homologues of SEQ ID NO: 483 described above or shown in the Sequential Listing.
[0102] Examples of amino acid sequences of functional homologues of the polypeptide shown in SEQ ID NO: 562 are provided in Figure 2 and the Sequential Listing. Such functional counterparts include, for example, CeresAnnot: 8703443 (SEQ ID NO: 564), GI: 194702514 (SEQ ID NO: 565), CeresClone: 699934 (SEQ ID NO: 567), GI: 32488374 (SEQ ID NO: 568) ), CeresClone: 1642517 (SEQ ID NO: 570), CeresClone: 1799746 (SEQ ID NO: 572), GI: 224077486 (SEQ ID NO: 573), GI: 83283997 (SEQ ID NO: 574), GI: 171451994 ( SEQ ID NO: 575), GI: 15223416 (SEQ ID NO: 576), CeresClone: 1999925 (SEQ ID NO: 578), CeresClone: 100177220 (SEQ ID NO: 580), CeresClone: 1822001 (SEQ ID NO: 582) , CeresClone: 570418 (SEQ ID NO: 584), CeresClone: 1998324 (SEQ ID NO: 586), CeresClone: 706252 (SEQ ID NO: 588), GI: 77554837 (SEQ ID NO: 589), GI: 125536425 (SEQ ID NO: 590), CeresAnnot: 1447508 (SEQ ID NO: 592), CeresClone: 1965618 (SEQ ID NO: 594), CeresClone: 1626139 (SEQ ID NO: 596), CeresAnnot: 8640237 (SEQ ID NO: 598), GI: 115450453 (SEQ ID NO: 599), CeresAnnot: 1438634 (SEQ ID NO: 601), GI: 147787209 (SEQ ID NO: 602), GI: 115483110 (SEQ ID NO: 603), CeresClone: 263964 (SEQ ID NO: 605), CeresAnnot: 1449592 (SEQ I D NO: 607), GI: 115461178 (SEQ ID NO: 608), GI: 29124977 (SEQ ID NO: 609), CeresClone: 476087 (SEQ ID NO: 611), CeresClone: 1587840 (SEQ ID NO: 613), CeresClone: 1808797 (SEQ ID NO: 615), CeresClone: 538771 (SEQ ID NO: 617), CeresClone: 1851138 (SEQ ID NO: 619), CeresClone: 1049645 (SEQ ID NO: 621), GI: 92897781 (SEQ ID NO: 622), CeresAnnot: 1487378 (SEQ ID NO: 624), GI: 92897782 (SEQ ID NO: 625), CeresClone: 648917 (SEQ ID NO: 627), CeresClone: 100011205 (SEQ ID NO: 629), GI : 116783342 (SEQ ID NO: 630), CeresAnnot: 1449591 (SEQ ID NO: 632), CeresClone: 521942 (SEQ ID NO: 634), CeresClone: 1653508 (SEQ ID NO: 636), and CeresAnnot: 1487377 (SEQ ID NO: 636) NO: 638). In some cases, a functional homologue of SEQ ID NO: 562 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70% , 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, relative to the amino acid sequence shown in SEQ ID NO: 562. In some cases, a functional counterpart SEQ ID NO: 562 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to one or more functional homologues of SEQ ID NO: 562 described above or shown in the Sequential Listing.
[0103] Examples of amino acid sequences from functional homologues of the polypeptide shown in SEQ ID NO: 246 are provided in Figure 3 and the Sequential Listing. Such functional counterparts include, for example, CeresClone: 1791988 (SEQ ID NO: 248), CeresAnnot: 8632546 (SEQ ID NO: 250), GI: 115455537 (SEQ ID NO: 251), GI: 118486821 (SEQ ID NO: 252 ), CeresClone: 537690 (SEQ ID NO: 254), CeresAnnot: 880540 (SEQ ID NO: 256), CeresClone: 797459 (SEQ ID NO: 258), CeresClone: 630408 (SEQ ID NO: 260), GI: 125557053 ( SEQ ID NO: 261), GI: 125588020 (SEQ ID NO: 262), CeresAnnot: 1733246 (SEQ ID NO: 264), CeresAnnot: 1451294 (SEQ ID NO: 266), CeresAnnot: 1457031 (SEQ ID NO: 268) , CeresClone: 100063507 (SEQ ID NO: 270), CeresClone: 560820 (SEQ ID NO: 272), CeresClone: 1104471 (SEQ ID NO: 274), GI: 30690890 (SEQ ID NO: 275), GI: 18402692 (SEQ ID NO: 276), and CeresClone: 2686 (SEQ ID NO: 278). In some cases, a functional homologue of SEQ ID NO: 246 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70% , 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, relative to the amino acid sequence shown in SEQ ID NO: 246. In some cases, a functional counterpart SEQ ID NO: 246 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity with respect to one or more functional homologues of SEQ ID NO: 246 described above or shown in the Sequential Listing.
[0104] Examples of amino acid sequences from functional homologues of the polypeptide shown in SEQ ID NO: 111 are provided in Figure 4 and the Sequential Listing. Such functional counterparts include, for example, CeresAnnot: 8726250 (SEQ ID NO: 113), CeresClone: 899059 (SEQ ID NO: 115), CeresClone: 945132 (SEQ ID NO: 117), GI: 115462673 (SEQ ID NO: 118) ), CeresClone: 16400 (SEQ ID NO: 120), CeresClone: 1712201 (SEQ ID NO: 122), CeresAnnot: 1524669 (SEQ ID NO: 124), CeresAnnot: 8672987 (SEQ ID NO: 126), CeresClone: 1434951 ( SEQ ID NO: 128), CeresClone: 299745 (SEQ ID NO: 130), CeresClone: 323696 (SEQ ID NO: 132), GI: 194695666 (SEQ ID NO: 133), CeresClone: 1771257 (SEQ ID NO: 135) , GI: 115445433 (SEQ ID NO: 136), CeresAnnot: 8667876 (SEQ ID NO: 138), GI: 115438957 (SEQ ID NO: 139), CeresClone: 1100814 (SEQ ID NO: 141), CeresClone: 1029710 (SEQ ID NO: 143), CeresClone: 969326 (SEQ ID NO: 145), CeresClone: 100955392 (SEQ ID NO: 147), GI: 225454450 (SEQ ID NO: 148), GI: 116779724 (SEQ ID NO: 149), CeresAnnot: 1447561 (SEQ ID NO: 151), GI: 20149060 (SEQ ID NO: 152), GI: 225462683 (SEQ ID NO: 153), and CeresClone: 595099 (SEQ ID NO: 155). In some cases, a functional homologue of SEQ ID NO: 111 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70% , 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, relative to the amino acid sequence shown in SEQ ID NO: 111. In some cases, a functional counterpart SEQ ID NO: 111 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity with respect to one or more functional homologues of SEQ ID NO: 111 described above or shown in the Sequential Listing.
[0105] Examples of amino acid sequences from functional homologues of the polypeptide shown in SEQ ID NO: 348 are provided in Figure 5 and the Sequential Listing. Such functional counterparts include, for example, CeresAnnot: 8642214 (SEQ ID NO: 350), GI: 115451805 (SEQ ID NO: 351), CeresClone: 890595 (SEQ ID NO: 353), CeresAnnot: 1463701 (SEQ ID NO: 355 ), CeresClone: 1840970 (SEQ ID NO: 357), CeresClone: 672495 (SEQ ID NO: 359), GI: 225424452 (SEQ ID NO: 360), GI: 15223878 (SEQ ID NO: 361), GI: 13560781 ( SEQ ID NO: 362), GI: 6681351 (SEQ ID NO: 363), GI: 116786783 (SEQ ID NO: 364), GI: 125543052 (SEQ ID NO: 365), GI: 124109193 (SEQ ID NO: 366) , CeresAnnot: 8653921 (SEQ ID NO: 368), CeresClone: 1995976 (SEQ ID NO: 370), CeresClone: 369312 (SEQ ID NO: 372), GI: 17047034 (SEQ ID NO: 373), GI: 118482018 (SEQ ID NO: 374), GI: 125530964 (SEQ ID NO: 375), GI: 125563629 (SEQ ID NO: 376), GI: 147797772 (SEQ ID NO: 377), CeresClone: 18876 (SEQ ID NO: 379), GI: 125540767 (SEQ ID NO: 380), GI: 115448069 (SEQ ID NO: 381), CeresClone: 683310 (SEQ ID NO: 383), GI: 125605601 (SEQ ID NO: 384), CeresClone: 1922671 (SEQ ID NO: 386), CeresClone: 100961902 (SEQ ID NO: 388), CeresAnnot: 1447077 (S EQ ID NO: 390), CeresClone: 1643790 (SEQ ID NO: 392), GI: 125580663 (SEQ ID NO: 393), GI: 116785331 (SEQ ID NO: 394), CeresAnnot: 1485570 (SEQ ID NO: 396) , CeresAnnot: 8681188 (SEQ ID NO: 398), CeresClone: 1818189 (SEQ ID NO: 400), CeresClone: 100861631 (SEQ ID NO: 402), CeresAnnot: 8671232 (SEQ ID NO: 404), CeresClone: 1813525 (SEQ ID NO: 406), GI: 15222593 (SEQ ID NO: 407), GI: 42795460 (SEQ ID NO: 408), CeresClone: 1828819 (SEQ ID NO: 410), CeresAnnot: 1460297 (SEQ ID NO: 412), GI: 225424689 (SEQ ID NO: 413), and GI: 76786474 (SEQ ID NO: 414). In some cases, a functional homologue of SEQ ID NO: 348 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70% , 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, relative to the amino acid sequence shown in SEQ ID NO: 348. In some cases, a functional homolog SEQ ID NO: 348 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity with respect to one or more functional homologues of SEQ ID NO: 348 described above or shown in the Sequential Listing.
[0106] Examples of amino acid sequences from functional homologues of the polypeptide shown in SEQ ID NO: 774 are provided in Figure 6 and the Sequential Listing. Such functional counterparts include, for example, GI: 115483997 (SEQ ID NO: 775), GI: 13398414 (SEQ ID NO: 776), GI: 33151175 (SEQ ID NO: 777), GI: 119507455 (SEQ ID NO: 778 ), CeresClone: 549408 (SEQ ID NO: 780), GI: 37777015 (SEQ ID NO: 781), GI: 157313302 (SEQ ID NO: 782), GI: 157072586 (SEQ ID NO: 783), CeresAnnot: 1506572 ( SEQ ID NO: 785), GI: 16417958 (SEQ ID NO: 786), CeresAnnot: 556941 (SEQ ID NO: 788), GI: 225440254 (SEQ ID NO: 789), CeresClone: 1753603 (SEQ ID NO: 791) , CeresClone: 236733 (SEQ ID NO: 793), CeresClone: 1786359 (SEQ ID NO: 795), GI: 115487150 (SEQ ID NO: 796), CeresAnnot: 8682811 (SEQ ID NO: 798), GI: 13398412 (SEQ ID NO: 799), GI: 116310992 (SEQ ID NO: 800), GI: 38347003 (SEQ ID NO: 801), GI: 116739148 (SEQ ID NO: 802), GI: 22324432 (SEQ ID NO: 803), CeresAnnot: 1453426 (SEQ ID NO: 805), CeresAnnot: 8657414 (SEQ ID NO: 807), GI: 108707861 Petition 870190069500, of 7/22/2019, p. 79/217 71/182 (SEQ ID NO: 808), CeresAnnot: 1528070 (SEQ ID NO: 810), GI: 22327075 (SEQ ID NO: 811), GI: 50507838 (SEQ ID NO: 812), GI: 168060089 (SEQ ID NO: 813), GI: 160890886 (SEQ ID NO: 814), GI: 189464007 (SEQ ID NO: 815), GI: 154492683 (SEQ ID NO: 816), GI: 146300858 (SEQ ID NO: 817 ), GI: 150008552 (SEQ ID NO: 818), GI: 86142284 (SEQ ID NO: 819), GI: 148269769 (SEQ ID NO: 820), and GI: 170288456 (SEQ ID NO: 821). In some cases, a functional homologue of SEQ ID NO: 774 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70% , 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, relative to the amino acid sequence shown in SEQ ID NO: 774. In some cases, a functional counterpart SEQ ID NO: 774 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity to one or more functional homologues of SEQ ID NO: 774 described above or shown in the Sequential Listing.
[0107] Examples of amino acid sequences from functional homologues of the polypeptide shown in SEQ ID NO: 416 are provided in Figure 7 and the Sequential Listing. Such functional counterparts include, for example, CeresAnnot: 8656625 (SEQ ID NO: 418), GI: 162462515 (SEQ ID NO: 419), GI: 75133694 (SEQ ID NO: 420), CeresClone: 829440 (SEQ ID NO: 422 ), GI: 118488472 (SEQ ID NO: 423), GI: 90657534 (SEQ ID NO: 424), CeresClone: 1237946 (SEQ ID NO: 426), GI: 225456557 (SEQ ID NO: 427), CeresAnnot: 1355066 ( SEQ ID NO: 429), GI: 38194917 (SEQ ID NO: 430), GI: 116788824 (SEQ ID NO: 431), CeresClone: 1848658 (SEQ ID NO: 433), GI: 116790012 (SEQ ID NO: 434) , Petition 870190069500, of 7/22/2019, p. 80/217 72/182 CeresClone: 570485 (SEQ ID NO: 436), GI: 125559102 (SEQ ID NO: 437), CeresClone: 1957107 (SEQ ID NO: 439), CeresClone: 1781794 (SEQ ID NO: 441), GI: 115453531 (SEQ ID NO: 442), CeresClone: 285169 (SEQ ID NO: 444), CeresAnnot: 1450186 (SEQ ID NO: 446), CeresClone: 1806851 (SEQ ID NO: 448), GI: 38194916 (SEQ ID NO: 449), GI: 225451792 (SEQ ID NO: 450), GI: 225456559 (SEQ ID NO: 451), GI: 224124236 (SEQ ID NO: 452), CeresClone: 17250 (SEQ ID NO: 454), CeresAnnot : 1363625 (SEQ ID NO: 456), CeresAnnot: 1450185 (SEQ ID NO: 458), GI: 125552171 (SEQ ID NO: 459), GI: 115463639 (SEQ ID NO: 460), CeresAnnot: 1809854 (SEQ ID NO: NO : 462), GI: 162462330 (SEQ ID NO: 463), CeresAnnot: 1326475 (SEQ ID NO: 465), GI: 125559101 (SEQ ID NO: 466), CeresAnnot: 8632643 (SEQ ID NO: 468), CeresClone: 1546455 (SEQ ID NO: 470), CeresClone: 1788775 (SEQ ID NO: 472), GI: 162462156 (SEQ ID NO: 473), GI: 125545759 (SEQ ID NO: 474), CeresClone: 236876 (SEQ ID NO: 476), CeresAnnot: 8640602 (SEQ ID NO: 478), GI: 30090032 (SEQ ID NO: 47 9), GI: 38230578 (SEQ ID NO: 480), and GI: 115453533 (SEQ ID NO: 481). In some cases, a functional homologue of SEQ ID NO: 416 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70% , 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, relative to the amino acid sequence shown in SEQ ID NO: 416. In some cases, a functional counterpart SEQ ID NO: 416 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% of sequential identity, in relation to one or more Petition 870190069500, of 7/22/2019, p. 81/217 73/182 functional counterparts of SEQ ID NO: 416 described above or shown in the Sequential Listing.
[0108] Examples of amino acid sequences from functional homologues of the polypeptide shown in SEQ ID NO: 2 are provided in Figure 8 and the Sequential Listing. Such functional counterparts include, for example, CeresAnnot: 8701928 (SEQ ID NO: 4), CeresClone: 630287 (SEQ ID NO: 6), GI: 115447391 (SEQ ID NO: 7), GI: 225453032 (SEQ ID NO: 8) ), CeresClone: 1919301 (SEQ ID NO: 10), CeresAnnot: 883070 (SEQ ID NO: 12), CeresAnnot: 1469624 (SEQ ID NO: 14), GI: 168065791 (SEQ ID NO: 15), CeresClone: 1887777 ( SEQ ID NO: 17), GI: 57834149 (SEQ ID NO: 18), GI: 116310214 (SEQ ID NO: 19), GI: 18087513 (SEQ ID NO: 20), GI: 147841543 (SEQ ID NO: 21) , GI: 168014382 (SEQ ID NO: 22), and CeresAnnot: 8462062 (SEQ ID NO: 24). In some cases, a functional homologue of SEQ ID NO: 2 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70% , 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, relative to the amino acid sequence shown in SEQ ID NO: 2. In some cases, a functional counterpart SEQ ID NO: 2 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity with respect to one or more functional homologues of SEQ ID NO: 2 described above or shown in the Sequential Listing.
[0109] Examples of amino acid sequences of functional homologues of the polypeptide presented in SEQ ID NO: 157 are Petition 870190069500, of 7/22/2019, p. 82/217 74/182 provided in Figure 9 and in the Sequential Listing. Such functional counterparts include, for example, GI: 56409850 (SEQ ID NO: 158), CeresAnnot: 8740887 (SEQ ID NO: 160), GI: 162460428 (SEQ ID NO: 161), GI: 115453815 (SEQ ID NO: 162) ), GI: 56409844 (SEQ ID NO: 163), GI: 31339690 (SEQ ID NO: 164), GI: 9294073 (SEQ ID NO: 165), CeresAnnot: 1473325 (SEQ ID NO: 167), GI: 31296713 ( SEQ ID NO: 168), CeresClone: 1925376 (SEQ ID NO: 170), GI: 56409848 (SEQ ID NO: 171), GI: 125544555 (SEQ ID NO: 172), GI: 115445881 (SEQ ID NO: 173) , CeresAnnot: 8674833 (SEQ ID NO: 175), CeresClone: 914572 (SEQ ID NO: 177), CeresAnnot: 8659084 (SEQ ID NO: 179), CeresClone: 1781320 (SEQ ID NO: 181), GI: 53791307 (SEQ ID NO: 182), CeresAnnot: 8659080 (SEQ ID NO: 184), GI: 212275650 (SEQ ID NO: 185), CeresClone: 1818693 (SEQ ID NO: 187), CeresClone: 508386 (SEQ ID NO: 189), GI: 53791309 (SEQ ID NO: 190), CeresAnnot: 8659051 (SEQ ID NO: 192), CeresClone: 1862153 (SEQ ID NO: 194), CeresClone: 1902844 (SEQ ID NO: 196), GI: 212275101 (SEQ ID NO: 197), CeresClone: 1844210 (SEQ ID NO: 199), CeresAnnot: 8658929 (SEQ ID NO: 201), GI: 125555301 (SEQ ID NO: 202), CeresClone: 825530 (SEQ ID NO: 204), GI: 115444075 (SEQ ID NO: 205), CeresClone: 1748522 (SEQ ID NO: 207), GI: 115445889 (SEQ ID NO: 208), CeresAnnot: 8671335 (SEQ ID NO: 210), GI: 53791308 (SEQ ID NO: 211), CeresClone: 1899806 (SEQ ID NO: 213), CeresClone: 1726616 (SEQ ID NO: 215), GI: 162460449 (SEQ ID NO: 216), CeresClone: 1770027 (SEQ ID NO: 218), CeresAnnot: 1467806 (SEQ ID NO: 220), GI: 55792425 (SEQ ID NO: 221 ), GI: 56409862 (SEQ ID NO: 222), GI: 115482674 (SEQ ID NO: 223), CeresClone: 815962 (SEQ ID NO: 225), GI: 56409860 (SEQ ID NO: 226), CeresAnnot: 8670072 ( SEQ ID NO: 228), CeresAnnot: 1473327 (SEQ ID NO: 230), Petition 870190069500, of 7/22/2019, p. 83/217 75/182 CeresClone: 1726182 (SEQ ID NO: 232), CeresAnnot: 8734902 (SEQ ID NO: 234), CeresAnnot: 8741882 (SEQ ID NO: 236), CeresClone: 761431 (SEQ ID NO: 238), CeresAnnot: 8678791 (SEQ ID NO: 240), CeresClone: 845464 (SEQ ID NO: 242), and CeresClone: 1726076 (SEQ ID NO: 244). In some cases, a functional homologue of SEQ ID NO: 157 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70% , 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, relative to the amino acid sequence shown in SEQ ID NO: 157. In some cases, a functional counterpart SEQ ID NO: 157 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to one or more functional homologues of SEQ ID NO: 157 described above or shown in the Sequential Listing.
[0110] Examples of amino acid sequences from functional homologues of the polypeptide shown in SEQ ID NO: 280 are provided in Figure 10 and the Sequential Listing. Such functional counterparts include, for example, CeresAnnot: 8681689 (SEQ ID NO: 282), GI: 226529851 (SEQ ID NO: 283), GI: 115448865 (SEQ ID NO: 284), GI: 154163107 (SEQ ID NO: 285 ), GI: 147817757 (SEQ ID NO: 286), CeresClone: 1925709 (SEQ ID NO: 288), GI: 15227566 (SEQ ID NO: 289), GI: 20138107 (SEQ ID NO: 290), CeresClone: 934069 ( SEQ ID NO: 292), CeresAnnot: 8681691 (SEQ ID NO: 294), GI: 46805726 (SEQ ID NO: 295), GI: 125541250 (SEQ ID NO: 296), GI: 115448869 (SEQ ID NO: 297) , GI: 115467048 (SEQ ID NO: 298), GI: 51090521 (SEQ ID NO: 299), GI: 125554524 (SEQ ID Petition 870190069500, of 7/22/2019, page 84/217 76/182 NO: 300 ), CeresAnnot: 8735787 (SEQ ID NO: 302), GI: 15227563 (SEQ ID NO: 303), CeresAnnot: 8681690 (SEQ ID NO: 305), GI: 15223062 (SEQ ID NO: 306), CeresAnnot: 8735782 ( SEQ ID NO: 308), GI: 115467046 (SEQ ID NO: 309), GI: 125554519 (SEQ ID NO: 310), GI: 125596466 (SEQ ID NO: 311), GI: 20138442 (SEQ ID NO: 312) , CeresAnnot: 1448326 (SEQ ID NO: 314), CeresAnnot: 8735776 (SEQ ID NO: 316), GI: 125554515 (SEQ ID NO: 317), GI: 154163097 (SEQ ID NO: 318), CeresAnnot: 8673445 (SEQ ID NO: 320), GI: 115445521 (SEQ ID NO: 321), CeresAnnot: 1448328 (SEQ ID NO: 323), CeresAnnot: 1437779 (SEQ ID NO: 325), GI: 15226507 (SEQ ID NO: 326), GI: 154163099 (SEQ ID NO: 327), GI: 93139696 (SEQ ID NO: 328), GI: 154163101 (SEQ ID NO: 329) ), CeresAnnot: 1448327 (SEQ ID NO: 331), CeresAnnot: 8681687 (SEQ ID NO: 333), CeresAnnot: 1437782 (SEQ ID NO: 335), GI: 20138443 (SEQ ID NO: 336), GI: 15226501 ( SEQ ID NO: 337), GI: 125541240 (SEQ ID NO: 338), GI: 115458656 (SEQ ID NO: 339), GI: 125548499 (SEQ ID NO: 340), (CeresAnnot: 8654550SEQ ID NO: 342), CeresAnnot: 8701112 (SEQ ID NO: 344), and CeresClone: 1530993 (SEQ ID NO: 346). In some cases, a functional homologue of SEQ ID NO: 280 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70% , 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, relative to the amino acid sequence shown in SEQ ID NO: 280. In some cases, a functional counterpart SEQ ID NO: 280 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% of sequential identity, in relation to one or more counterparts Petition 870190069500, of 7/22/2019, p. 85/217 77/182 of SEQ ID NO: 280 described above or shown in the Sequential Listing.
[0111] Examples of amino acid sequences from functional homologues of the polypeptide shown in SEQ ID NO: 641 are provided in Figure 11 and the Sequential Listing. Such functional counterparts include, for example, CeresAnnot: 8744420 (SEQ ID NO: 643), CeresClone: 331385 (SEQ ID NO: 645), GI: 115469712 (SEQ ID NO: 646), GI: 1890577 (SEQ ID NO: 647 ), GI: 51039064 (SEQ ID NO: 648), GI: 14330332 (SEQ ID NO: 649), GI: 147854712 (SEQ ID NO: 650), GI: 157352236 (SEQ ID NO: 651), GI: 118722746 ( SEQ ID NO: 652), GI: 8886867 (SEQ ID NO: 653), GI: 115334952 (SEQ ID NO: 654), CeresClone: 1789502 (SEQ ID NO: 656), CeresClone: 1805428 (SEQ ID NO: 658) , CeresClone: 1724099 (SEQ ID NO: 660), CeresClone: 1724817 (SEQ ID NO: 662), CeresClone: 1804995 (SEQ ID NO: 664), CeresClone: 1446366 (SEQ ID NO: 666), CeresClone: 1054422 (SEQ ID NO: 668), CeresClone: 263803 (SEQ ID NO: 670), CeresClone: 1821034 (SEQ ID NO: 672), CeresClone: 1806021 (SEQ ID NO: 674), CeresClone: 1727689 (SEQ ID NO: 676), GI: 115469720 (SEQ ID NO: 677), CeresAnnot: 8744425 (SEQ ID NO: 679), GI: 212275237 (SEQ ID NO: 680), CeresClone: 1724271 (SEQ ID NO: 682), CeresClone: 247073 (SEQ ID NO: 682) NO: 684), CeresClone: 1020658 (SEQ ID NO: 686), GI: 1890575 (SEQ I D NO: 687), GI: 225446111 (SEQ ID NO: 688), GI: 225446115 (SEQ ID NO: 689), GI: 147854714 (SEQ ID NO: 690), GI: 68532877 (SEQ ID NO: 691), GI: 147779866 (SEQ ID NO: 692), CeresClone: 100062911 (SEQ ID NO: 694), GI: 225446117 (SEQ ID NO: 695), CeresClone: 1832719 (SEQ ID NO: 697), CeresClone: 1793297 (SEQ ID NO: 699), CeresClone: 1848637 (SEQ ID NO: 701), GI: 225446103 (SEQ ID NO: 702), CeresAnnot: 1362908 (SEQ ID NO: 704), Petition 870190069500, of 7/22/2019, p. 86/217 78/182 CeresClone: 100064069 (SEQ ID NO: 706), CeresAnnot: 1469128 (SEQ ID NO: 708), CeresClone: 656868 (SEQ ID NO: 710), CeresClone: 1793334 (SEQ ID NO: 712), GI: 29500891 (SEQ ID NO: 713), CeresClone: 1895226 (SEQ ID NO: 715), GI: 8886865 (SEQ ID NO: 716), CeresAnnot: 878947 (SEQ ID NO: 718), CeresClone: 1045431 (SEQ ID NO: 720), GI: 22947852 (SEQ ID NO: 721), CeresClone: 1855067 (SEQ ID NO: 723), GI: 17064792 (SEQ ID NO: 724), CeresClone: 662227 (SEQ ID NO: 726), GI : 225446109 (SEQ ID NO: 727), CeresClone: 522574 (SEQ ID NO: 729), GI: 115334954 (SEQ ID NO: 730), CeresClone: 581426 (SEQ ID NO: 732), GI: 124109191 (SEQ ID NO: NO : 733), CeresAnnot: 1471882 (SEQ ID NO: 735), GI: 34809190 (SEQ ID NO: 736), GI: 29500893 (SEQ ID NO: 737), CeresAnnot: 1452398 (SEQ ID NO: 739), GI: 124109199 (SEQ ID NO: 740), CeresAnnot: 1478206 (SEQ ID NO: 742), CeresAnnot: 1445599 (SEQ ID NO: 744), CeresAnnot: 1452397 (SEQ ID NO: 746), GI: 19911573 (SEQ ID NO: 747), GI: 124109181 (SEQ ID NO: 748), GI: 22327914 (SEQ ID NO: 749), GI: 42795468 (SEQ ID NO: 750), G I: 42795462 (SEQ ID NO: 751), CeresAnnot: 1466060 (SEQ ID NO: 753), CeresAnnot: 8461207 (SEQ ID NO: 755), CeresAnnot: 1506985 (SEQ ID NO: 757), GI: 3901012 (SEQ ID NO: 758), CeresAnnot: 1443040 (SEQ ID NO: 760), GI: 90811697 (SEQ ID NO: 761), CeresAnnot: 1443041 (SEQ ID NO: 763), GI: 157358970 (SEQ ID NO: 764), GI : 90656516 (SEQ ID NO: 765), GI: 577066 (SEQ ID NO: 766), GI: 90656520 (SEQ ID NO: 767), GI: 88683124 (SEQ ID NO: 768), GI: 90656518 (SEQ ID NO: 768) : 769), CeresAnnot: 1482565 (SEQ ID NO: 771), GI: 15238891 (SEQ ID NO: 772), and Ceres Clone ID No. 933491 (SEQ ID NO: 823). In some cases, a functional homologue of SEQ ID NO: 641 has an amino acid sequence with at least 45% sequential identity, for example, 50%, Petition 870190069500, of 7/22/2019, p. 87/217 79/182 52%, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% identity sequential, relative to the amino acid sequence shown in SEQ ID NO: 641. In some cases, a functional homologue of SEQ ID NO: 641 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52 %, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% of sequential identity with respect to one or more functional homologues of SEQ ID NO: 641 described above or shown in the Sequential Listing.
[0112] Examples of amino acid sequences from functional homologues of the polypeptide shown in SEQ ID NO: 26 are provided in Figure 12 and the Sequential Listing. Such functional counterparts include, for example, CeresClone: 570179 (SEQ ID NO: 28), GI: 54290293 (SEQ ID NO: 29), GI: 1617121 (SEQ ID NO: 30), CeresAnnot: 8724383 (SEQ ID NO: 32) ), CeresClone: 896724 (SEQ ID NO: 34), CeresClone: 607452 (SEQ ID NO: 36), GI: 37904392 (SEQ ID NO: 37), CeresClone: 1870473 (SEQ ID NO: 39), CeresClone: 2026564 ( SEQ ID NO: 41), CeresClone: 2004365 (SEQ ID NO: 43), CeresClone: 2020677 (SEQ ID NO: 45), CeresClone: 2039538 (SEQ ID NO: 47), CeresClone: 844611 (SEQ ID NO: 49) , GI: 125526847 (SEQ ID NO: 50), CeresClone: 597887 (SEQ ID NO: 52), GI: 58396949 (SEQ ID NO: 53), CeresClone: 684778 (SEQ ID NO: 55), CeresClone: 699511 (SEQ ID NO: 57), CeresClone: 1803377 (SEQ ID NO: 59), CeresClone: 1888961 (SEQ ID NO: 61), CeresClone: 897331 (SEQ ID NO: 63), CeresClone: 617775 (SEQ ID NO: 65), GI: 20513866 (SEQ ID NO: 66), CeresAnnot: 8724387 (SEQ ID NO: 68), CeresClone: 1804405 (SEQ ID NO: 70), GI: 48093396 (SEQ ID NO: 71), GI: 108862602 (SEQ ID NO: 71), GI: 108862602 (SEQ ID NO: 71) NO: 72), GI: 115488400 (SEQ ID NO: 73), CeresClone: 759663 (SEQ ID NO: 75), CeresCl one: 1801827 (SEQ ID NO: 77), GI: 48093418 (SEQ Petition 870190069500, of 7/22/2019, p. 88/217 80/182 ID NO: 78), GI: 48093360 (SEQ ID NO: 79), CeresClone: 1457620 (SEQ ID NO: 81), GI: 48093370 (SEQ ID NO: 82), CeresClone: 639183 (SEQ ID NO: 84), CeresClone: 1453564 (SEQ ID NO: 86), CeresClone: 1531954 (SEQ ID NO: 88), CeresClone: 1460371 (SEQ ID NO: 90), CeresClone: 1627479 (SEQ ID NO: 92), CeresClone: 992630 (SEQ ID NO: 94), CeresClone: 685480 (SEQ ID NO: 96), GI: 75994159 (SEQ ID NO: 97), CeresAnnot: 8724380 (SEQ ID NO: 99), GI: 48093378 (SEQ ID NO: 99) NO: 100), GI: 75994143 (SEQ ID NO: 101), GI: 75994153 (SEQ ID NO: 102), CeresAnnot: 8724381 (SEQ ID NO: 104), GI: 75994157 (SEQ ID NO: 105), CeresClone : 730301 (SEQ ID NO: 107), and CeresAnnot: 8724388 (SEQ ID NO: 109). In some cases, a functional homologue of SEQ ID NO: 26 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70% , 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, relative to the amino acid sequence shown in SEQ ID NO: 26. In some cases, a functional counterpart SEQ ID NO: 26 has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to one or more functional homologues of SEQ ID NO: 26 described above or shown in the Sequential Listing.
[0113] The identification of conserved regions in a biomass composition modulator polypeptide facilitates the production of variants of biomass composition modulator polypeptides. Variants of the biomass composition modulating polypeptides typically have 10 or less conservative amino acid substitutions within Petition 870190069500, of 7/22/2019, p. 89/217 81/182 first amino acid sequence, for example, 7 or less conservative amino acid substitutions, 5 or less conservative amino acid substitutions, or between 1 and 5 conservative amino acid substitutions. A useful variant polypeptide can be constructed based on one of the alignments shown in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, in Figure 10, Figure 11, or Figure 12, and / or counterparts identified in the Sequential Listing. Such a polypeptide includes the conserved regions, organized in the order represented in the Figure from the amino-terminal to the carboxy-terminal end. Such a polypeptide can also include zero, one, or more than one amino acid at positions marked by dashes. When there are no amino acids present in the positions marked by dashes, such polypeptide has a length equal to the sum of the amino acid residues in all conserved regions and in all dashes. C. Functional counterparts identified by HMMER
[0114] In some embodiments, useful biomass composition modulating polypeptides include those that fit a Markov Hidden Model based on the polypeptides shown in any of Figures 1-12. A Hidden Markov Model (HMM) is a statistical model of a consensus sequence for a group of functional counterparts. See Durbin et al., Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, Cambridge, UK (1998). An HMM is generated by the HMMER 2.3.2 program with standard program parameters, using the sequences of the group of functional counterparts as input. Multiple sequential alignment is generated by ProbCons (Do et al., Genome Res., 15 (2): 330-40 (2005)) version 1.11 using a set of standard parameters: c, REPS of consistency of 2; -ir, REPS of iterative refinement equal to 100; -pre, pre-training REPS equal to 0. ProbCons is public domain software provided by Stanford University.
[0115] The standard parameters for building an HMM (hmmbuild) are as follows: the "prior architecture" (archpri) used by the MAP architecture construction is 0.85, and the standard cutoff (idlevel) used to determine the effective sequence number is 0.62. HMMER 2.3.2 was released on October 3, 2003 under a general public GNU license, and is available from various sources on the internet such as hmmer.janelia.org; hmmer.wustl.edu; and fr.com.hmmer232 /. Hmmbuild outputs the model as a text file.
[0116] The HMM for a group of functional homologs can be used to determine the chance that a given biomass composition modulator polypeptide sequence will fit better to that particular HMM than to a null HMM generated using a group of sequences that they are not structurally or functionally related. The chance that a given biomass composition modulator polypeptide sequence will fit better to an HMM than to a null HMM is indicated by the HMM score, a number generated when the candidate sequence is fitted to the HMM profile using the hmmsearch HMMER program . The following standard parameters are used to run the hmmsearch program: the default cut-off value E (E) is 10.0; the standard cut-off score (T) is negative infinite; the default number of strings in a database is the actual number of strings in the database; the default cutoff value E for the ranked per-domain list (domE) is infinite; and the standard cut-off score for the ranked per-domain (domT) list is negative infinity. A high HMM score indicates a greater chance that a given sequence will carry one or more of the biochemical or physiological functions of the polypeptides used to generate the HMM. A high HMM score is at least 20, and is generally higher. Slight variations in the HMM score for a particular sequence can occur due to factors such as the order in which the sequences are processed for alignment by multiple sequence alignment algorithms such as the ProbCons program. However, such variation in the HMM score is low.
[0117] The biomass composition modulating polypeptides discussed below fit the HMM indicated with an HMM score greater than 65 (for example, greater than 70, 80, 90, 100, 120, 140, 200, 300, 500, 1000, 1500, or 2000). In some embodiments, the HMM score of a biomass composition modulator polypeptide discussed below is about 50%, 60%, 70%, 80%, 90%, or 95% of the HMM score of a functional homolog provided in the Sequential Listing of this application. In some embodiments, a biomass composition modulator polypeptide discussed below fits the indicated HMM with an HMM score greater than 210, and has a domain indicative of a biomass composition modulator polypeptide. In some embodiments, a biomass composition modulator polypeptide discussed below fits the indicated HMM with an HMM score greater than 210, and has a sequential identity of 65% or more (for example, 75%, 80%, 85%, 90% , 95%, or 100% sequential identity) with respect to an amino acid sequence shown in any of Figures 1-12.
[0118] Examples of polypeptides that have HMM scores greater than 84 (for example, greater than 100, 120, 140, 160, 180, 200, 220, 240, 250, 260, 270, 280, or 290) when faced with a HMM generated from the amino acid sequences shown in Figure 1 are identified in the Sequential Listing of this Order. Such polypeptides include, for example, SEQ ID NOs: 483, 485, 486, 488, 490, 492, 493, 495, 497, 498, 500, 501, 503, 504, 506, 508, 509, 510, 512, 514, 515, 516, 517, 519, 521, 522, 523, 525, 527, 529, 531, 533, 534, 535, 537, 539, 541, 543, 545, 547, 549, 551, 552, 554, 556, 557, and 559.
[0119] Examples of polypeptides that have HMM scores greater than 120 (for example, greater than 125, 130, 140, 150, 160, 170, 180, 200, 220, 240, 260, 280, 300, or 315) when confronted with an HMM generated from the amino acid sequences shown in Figure 2 are identified in the Sequential Listing of this Order. Such polypeptides include, for example, SEQ ID NOs: 562, 564, 565, 567, 568, 570, 572, 573, 574, 575, 576, 578, 580, 582, 584, 586, 588, 589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611, 613, 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, and 638.
[0120] Examples of polypeptides that have HMM scores greater than 200 (for example, greater than 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 975 , or 1000) when confronted with an HMM generated from the amino acid sequences shown in Figure 3 are identified in the Sequential Listing of this Order. Such polypeptides include, for example, SEQ ID NOs: 246, 248, 250, 251, 252, 254, 256, 258, 260, 261, 262, 264, 266, 268, 270, 272, 274, 275, 276, and 278.
[0121] Examples of polypeptides that have HMM scores greater than 93 (for example, greater than 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, or 145) when confronted with an HMM generated from of the amino acid sequences shown in Figure 4 are identified in the Sequential Listing of this Order. Such polypeptides include, for example, SEQ ID NOs: 111, 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132, 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149, 151, 152, 153, and 155.
[0122] Examples of polypeptides that have HMM scores greater than 387 (for example, greater than 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, or 920) when confronted with a generated HMM from the amino acid sequences shown in Figure 5 are identified in the Sequential Listing of this Application. Such polypeptides include, for example, SEQ ID NOs: 348, 350, 351, 353, 355, 357, 359, 360, 361, 362, 363, 364, 365, 366, 368, 370, 372, 373, 374, 375, 376, 377, 379, 380, 381, 383, 384, 386, 388, 390, 392, Petition 870190069500, of 7/22/2019, p. 94/217 86/182 393, 394, 396, 398, 400, 402, 404, 406, 407, 408, 410, 412, 413, and 414.
[0123] Examples of polypeptides that have HMM scores greater than 315 (for example, greater than 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100 , 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1500, 1550, 1600, 1620, 1630, or 1640) when confronted with an HMM generated from the amino acid sequences shown in Figure 6 are identified in the Listing Sequential Order. Such polypeptides include, for example, SEQ ID NOs: 774, 775, 776, 777, 778, 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800, 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, and 821.
[0124] Examples of polypeptides that have HMM scores greater than 914 (for example, greater than 920, 940, 960, 980, 1000, 1020, 1040,1060, 1080, 1090, or 1100) when confronted with an HMM generated from of the amino acid sequences shown in Figure 7 are identified in the Sequential Listing of this Order. Such polypeptides include, for example, SEQ ID NOs: 416, 418, 419, 420, 422, 423, 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450, 451, 452, 454, 456, 458, 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, and 481.
[0125] Examples of polypeptides that have HMM scores greater than 659 (for example, greater than 675, 700, 800, 900, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1425, or 1440) when confronted with an HMM generated from the amino acid sequences shown in Figure 8 are identified in the Sequential Listing of this Order. Such polypeptides include, for example, SEQ ID NOs: 2, 4, 6, 7, 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22, and 24.
[0126] Examples of polypeptides that have HMM scores greater than 406 (for example, greater than 420, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150 , 1200, 1250, 1300, 1350, 1400, 1420, or 1440) when confronted with an HMM generated from the amino acid sequences shown in Figure 9 are identified in the Sequential Listing of this Application. Such polypeptides include, for example, SEQ ID NOs: 157, 158, 160, 161, 162, 163, 164, 165, 167, 168, 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185, 187, 189, 190, 192, 194, 196, 197, 199, 201, 202, 204, 205, 207, 208, 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225, 226, 228, 230, 232, 234, 236, 238, 240, 242, and 244.
[0127] Examples of polypeptides that have HMM scores greater than 640 (for example, greater than 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400 , 1450, 1500, or 1510), when confronted with an HMM generated from the amino acid sequences shown in Figure 10 are identified in the Sequential Listing of this Order. Such polypeptides include, for example, SEQ ID NOs: 280, 282, 283, 284, 285, 286, 288, 289, 290, 292, 294, 295, 296, 297, 298, 299, 300, 302, 303, 305, 306, 308, 309, 310, 311, 312, 314, 316, 317, 318, 320, 321, 323, 325, 326, Petition 870190069500, of 7/22/2019, p. 96/217 88/182 327, 328, 329, 331, 333, 335, 336, 337, 338, 339, 340, 342, 344, and 346.
[0128] Examples of polypeptides that have HMM scores greater than 234 (for example, greater than 250, 275, 300, 325, 350, 375, 400, 424, 450, 475, 500, 525, 550, 575, 600, 626 , 650, 675, 700, or 720) when confronted with an HMM generated from the amino acid sequences shown in Figure 11 are identified in the Sequential Listing of this Order. Such polypeptides include, for example, SEQ ID NOs: 641, 643, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 677, 679, 680, 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697, 699, 701, 702, 704, 706, 708, 710, 712, 713, 715, 716, 718, 720, 721, 723, 724, 726, 727, 729, 730, 732, 733, 735, 736, 737, 739, 740, 742, 744, 746, 747, 748, 749, 750, 751, 753, 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769, 771, 772, and 823.
[0129] Examples of polypeptides that have HMM scores greater than 131 (for example, greater than 135, 140, 145, 150, 151, 152, 153, or 154) when confronted with an HMM generated from the amino acid sequences shown in Figure 12 are identified in the Sequential Listing of this Order. Such polypeptides include, for example, SEQ ID NOs: 26, 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47, 49, 50, 52, 53, 55, 57, 59, 61, 63, 65, 66, 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88, 90, 92, 94, 96, 97, 99, 100, 101, 102, 104, 105, 107, and 109. D. Percentage Identity
[0130] In some embodiments, a biomass composition modulator polypeptide has an amino acid sequence with at least 45% sequential identity, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, to one of the amino acid sequences presented in SEQ ID NOs: 2, 4, 6, 7, 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22, 24, 26, 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47, 49, 50, 52, 53, 55, 57, 59, 61, 63, 65, 66, 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88, 90, 92, 94, 96, 97, 99, 100, 101, 102, 104, 105, 107, 109, 111, 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132, 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149, 151, 152, 153, 155, 157, 158, 160, 161, 162, 163, 164, 165, 167, 168, 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185, 187, 189, 190, 192, 194, 196, 197, 199, 201, 202, 204, 205, 207, 208, 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225, 226, 228, 230, 232, 234 , 236, 238, 240, 242, 244, 246, 248, 250, 251, 252, 254, 256, 258, 260, 261, 262, 264, 266, 268, 270, 272, 274, 275, 276, 278 , 280, 282, 283, 284, 285, 286, 288, 289, 290, 292, 294, 295, 296, 297, 298, 299, 300, 302, 303, 305, 306, 308, 309, 310, 311 , 312, 314, 316, 317, 318, 320, 321, 323, 325, 326, 327, 328, 329, 331, 333, 335, 336, 337, 338, 339, 340, 342, 344, 346, 348 , 350, 351, 353, 355, 357, 359, 360, 361, 362, 363, 364, 365, 366, 368, 370, 372, 373, 374, 375, 376, 377, 379, 380, 381, 383 , 384, 386, 388, 390, 392, 393, 394, 396, 398, 400, 402, 404, 406, 407, 408, 410, 412, 413, 414, 416, 418, 419, 420, 422, 423 , 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450, 451, 452, 454, 456, 458, Petition 870190069500, of 07/22/2019, p. 98/217 90/182 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, 481, 483, 485, 486, 488, 490, 492, 493, 495, 497, 498, 500, 501, 503, 504, 506, 508, 509, 510, 512, 514, 515, 516, 517, 519, 521, 522, 523, 525, 527, 529, 531, 533, 534, 535, 537, 539, 541, 543, 545, 547, 549, 551, 552, 554, 556, 557, 559, 562, 564, 565, 567, 568, 570, 572, 573, 574, 575, 576, 578, 580, 582, 584, 586, 588, 589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611, 613, 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, 638, 641, 643, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 677, 679, 680, 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697, 699, 701, 702, 704, 706, 708, 710, 712, 713, 715, 716, 718, 720, 721, 723, 724, 726, 727, 729, 730, 732, 733, 735, 736, 737, 739, 740, 742, 744, 746, 747, 748, 749, 750, 751, 753, 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769, 771, 772, 774, 775, 776, 777, 778, 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800, 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, and 823. Polypeptides with such a percent sequential identity generally have a domain indicative of a biomass composition modulator polypeptide and / or have an HMM score greater than 65, as discussed above. Amino acid sequences of polypeptide modulators of biomass composition with sequential identity of at least 80% in relation to one of the amino acid sequences presented in SEQ ID NOs: 2, 4, 6, 7, 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22, 24, 26, 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47, 49, 50, 52, 53, 55, 57, 59, Petition 870190069500, of 7/22/2019, p. 99/217 91/182 61, 63, 65, 66, 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88, 90, 92, 94, 96, 97, 99, 100, 101, 102, 104, 105, 107, 109, 111, 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132, 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149, 151, 152, 153, 155, 157, 158, 160, 161, 162, 163, 164, 165, 167, 168, 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185, 187, 189, 190, 192, 194, 196, 197, 199, 201, 202, 204, 205, 207, 208, 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 251, 252, 254, 256, 258, 260, 261, 262, 264, 266, 268, 270, 272, 274, 275, 276, 278, 280, 282, 283, 284, 285, 286, 288, 289, 290, 292, 294, 295, 296, 297, 298, 299, 300, 302, 303, 305, 306, 308, 309, 310, 311, 312, 314, 316, 317, 318, 320, 321, 323, 325, 326, 327, 328, 329, 331, 333, 335, 336, 337, 338, 339, 340, 342, 344, 346, 348, 350, 351, 353, 355, 357, 359, 360, 361, 362, 363, 364, 365, 366, 368, 370, 372, 373, 374 , 375, 376, 377, 379, 380, 381, 383, 384, 386, 388, 390, 392, 393, 394, 396, 398, 400, 402, 404, 406, 407, 408, 410, 412, 413 , 414, 416, 418, 419, 420, 422, 423, 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450 , 451, 452, 454, 456, 458, 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, 481, 483, 485, 486, 488 , 490, 492, 493, 495, 497, 498, 500, 501, 503, 504, 506, 508, 509, 510, 512, 514, 515, 516, 517, 519, 521, 522, 523, 525, 527 , 529, 531, 533, 534, 535, 537, 539, 541, 543, 545, 547, 549, 551, 552, 554, 556, 557, 559, 562, 564, 565, 567, 568, 570, 572 , 573, 574, 575, 576, 578, 580, 582, 584, 586, 588, 589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611 , 613, Petition 870190069500, of 7/22/2019, p. 100/217 92/182 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, 638, 641, 643, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 677, 679, 680, 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697, 699, 701, 702, 704, 706, 708, 710, 712, 713, 715, 716, 718, 720, 721, 723, 724, 726, 727, 729, 730, 732, 733, 735, 736, 737, 739, 740, 742, 744, 746, 747, 748, 749, 750, 751, 753, 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769, 771, 772, 774, 775, 776, 777, 778, 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800, 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, and 823 are provided in Figures 1-12 and the Sequential Listing.
[0131] "Percentage sequential identity" refers to the degree of sequential identity between any given reference sequence, for example, SEQ ID NO: 1, and a candidate biomass composition modulating sequence. A candidate sequence typically has a length that is 80 percent to 200 percent of the length of the reference sequence, for example, 82, 85, 87, 89, 90, 93, 95, 97, 99, 100, 105, 110, 115, 120, 130, 140, 150, 160, 170, 180, 190, or 200 percent of the length of the reference sequence. A percent identity for any candidate nucleic acid or polypeptide with respect to a reference nucleic acid or polypeptide can be determined as follows. A reference sequence (for example, a nucleic acid sequence or an amino acid sequence) is aligned with one or more candidate sequences using the computer program ClustalW (version 1.83, standard parameters), which allows Petition 870190069500, of 22 / 07/2019, p. 101/217 93/182 alignment of the nucleic acid or polypeptide sequences is carried out along their entire length (global alignment). Chenna et al., Nucleic Acids Res., 31 (13): 3497-500 (2003).
[0132] ClustalW calculates the best fit between a reference and one or more candidate sequences, and aligns them so that identities, similarities and differences can be determined. Gaps in one or more residues can be inserted into a reference sequence, a candidate sequence, or both, to maximize sequential alignments. For rapid alignment in pairs of nucleic acid sequences, the following standard parameters are used: word size: 2; window size: 4; scoring method: percentage; number of top diagonals: 4; and gap penalty: 5. For multiple alignment of nucleic acid sequences, the following parameters are used: gap gap penalty: 10.0; penalty for gap extension: 5.0; and weighted transitions: yes. For rapid alignment in protein sequence pairs, the following standard parameters are used: word size: 1; window size: 5; scoring method: percentage; number of top diagonals: 5; and gap penalty: 3. For multiple protein sequence alignment, the following parameters are used: weighting matrix: blosum; gap gap penalty: 10.0; penalty for gap extension: 0.05; hydrophilic gaps: bound; hydrophilic residues: Gly, Pro, Ser, Asn, Asp, Gln, Glu, Arg, and Lys; residual-specific penalties for gaps: on.
[0133] ClustalW output is a sequential alignment that reflects the relationship between sequences. ClustalW can be run, for example, on the Baylor College of Medicine Search Launcher website, on the internet (searchlauncher.bcm.tmc.edu/multi-align/multi-align.html) and on the European Bioinformatics Institute website ( ebi.ac.uk/clustalw).
[0134] To determine the percent identity of a candidate nucleic acid or amino acid sequence to a reference sequence, the sequences are aligned using ClustalW, the number of identical results in the alignment is divided by the length of the reference sequence, and the result is multiplied by 100. It should be noted that the value of the percent identity can be rounded to the nearest decimal. For example, 78.11; 78.12; 78.13; and 78.14 are rounded down to 78.1; while 78.15; 78.16; 78.17; 78.18; and 78.19 are rounded up to 78.2.
[0135] In some cases, a biomass composition modulator polypeptide has an amino acid sequence with a sequential identity of at least 45%, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, to the amino acid sequence shown in SEQ ID NO: 483. Polypeptide amino acid sequences with sequential identity greater than 45% in relation to the polypeptide shown in SEQ ID NO: 483 are provided in Figure 1 and the Sequential Listing.
[0136] In some cases, a biomass composition modulator polypeptide has an amino acid sequence with a sequential identity of at least 45%, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, to the amino acid sequence shown in SEQ ID NO: 562. Polypeptide amino acid sequences with sequential identity greater than 45% in relation to the polypeptide shown in SEQ ID NO: 562 are provided in Figure 2 and in the Sequential Listing.
[0137] In some cases, a polypeptide modulator of biomass composition has an amino acid sequence with sequential identity of at least 45%, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, to the amino acid sequence shown in SEQ ID NO: 246. Polypeptide amino acid sequences with sequential identity greater than 45% in relation to the polypeptide shown in SEQ ID NO: 246 are provided in Figure 3 and in the Sequential Listing.
[0138] In some cases, a biomass composition modulator polypeptide has an amino acid sequence with a sequential identity of at least 45%, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, to the amino acid sequence shown in SEQ ID NO: 111. Polypeptide amino acid sequences with sequential identity greater than 45% in relation to the polypeptide shown in SEQ ID NO: 111 are provided in Figure 4 and in the Sequential Listing.
[0139] In some cases, a biomass composition modulator polypeptide has an amino acid sequence with a sequential identity of at least 45%, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, to the amino acid sequence shown in SEQ ID NO: 348. Polypeptide amino acid sequences with sequential identity greater than 45% in relation to the polypeptide shown in SEQ ID NO: 348 are provided in Figure 5 and the Sequential Listing.
[0140] In some cases, a polypeptide modulator of biomass composition has an amino acid sequence with sequential identity of at least 45%, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, to the amino acid sequence shown in SEQ ID NO: 774. Polypeptide amino acid sequences with sequential identity greater than 45% in relation to the polypeptide shown in SEQ ID NO: 774 are provided in Figure 6 and the Sequential Listing.
[0141] In some cases, a biomass composition modulator polypeptide has an amino acid sequence with a sequential identity of at least 45%, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, to the amino acid sequence shown in SEQ ID NO: 416. Polypeptide amino acid sequences with sequential identity greater than 45% in relation to the polypeptide shown in SEQ ID NO: 416 are provided in Figure 7 and the Sequential Listing.
[0142] In some cases, a biomass composition modulator polypeptide has an amino acid sequence with a sequential identity of at least 45%, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, to the amino acid sequence shown in SEQ ID NO: 2. Polypeptide amino acid sequences with sequential identity greater than 45% in relation to the polypeptide shown in SEQ ID NO: 2 are provided in Figure 8 and in the Sequential Listing.
[0143] In some cases, a biomass composition modulator polypeptide has an amino acid sequence with a sequential identity of at least 45%, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, to the amino acid sequence shown in SEQ ID NO: 157. Polypeptide amino acid sequences with sequential identity greater than 45% in relation to the polypeptide shown in SEQ ID NO: 157 are provided in Figure 9 and in the Sequential Listing.
[0144] In some cases, a biomass composition modulator polypeptide has an amino acid sequence with a sequential identity of at least 45%, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, to the amino acid sequence shown in SEQ ID NO: 280. Polypeptide amino acid sequences with sequential identity greater than 45% in relation to the polypeptide shown in SEQ ID NO: 280 are provided in Figure 10 and the Sequential Listing.
[0145] In some cases, a biomass composition modulator polypeptide has an amino acid sequence with a sequential identity of at least 45%, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% of sequential identity, to the amino acid sequence shown in SEQ ID NO: 641. Polypeptide amino acid sequences with sequential identity greater than 45% in relation to the polypeptide shown in SEQ ID NO: 641 are provided in Figure 11 and in the Sequential Listing.
[0146] In some cases, a polypeptide modulator of biomass composition has an amino acid sequence with sequential identity of at least 45%, for example, 50%, 52%, 56%, 59%, 61%, 65%, 70 %, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% of sequential identity, to the amino acid sequence shown in SEQ ID NO: 26. Polypeptide amino acid sequences with sequential identity greater than 45% in relation to the polypeptide shown in SEQ ID NO: 26 are provided in Figure 12 and the Sequential Listing. E. Other strings
[0147] It should be realized that a polypeptide modulator of biomass composition may include additional amino acids that are not involved in the modulation of biomass, and thus such polypeptide may be longer than it would be if it did not include it. For example, a biomass composition modulator polypeptide can include a purification marker, a chloroplast transit peptide, a mitochondria transit peptide, an amyloplasty peptide, or a leader sequence added to the amino- or carboxy-terminal. In some embodiments, a biomass composition-modulating polypeptide includes an amino acid sequence that functions as a reporter, for example, a green fluorescent protein or a yellow fluorescent protein. 111. Nucleic Acids
[0148] The nucleic acids described here include nucleic acids that are effective in modulating the biomass composition when transcribed into a plant or plant cell. Such nucleic acids include, without limitation, those encoding a biomass composition modulator polypeptide and those that can be used to inhibit the expression of a biomass composition modulator polypeptide via a nucleic acid based method. A. Nucleic acids encoding polypeptides modulating biomass composition
[0149] Nucleic acids encoding polypeptides modulating biomass composition are described here. Examples of such nucleic acids include SEQ ID NOs: 1, 3, 5, 9, 11, 13, 16, 23, 25, 27, 31, 33, 35, 38, 40, 42, 44, 46, 48, 51 , 54, 56, 58, 60, 62, 64, 67, 69, 74, 76, 80, 83, 85, 87, 89, 91, 93, 95, 98, 103, 106, 108, 110, 112, 114 , 116, 119, 121, 123, 125, 127, 129, 131, 134, 137, 140, 142, 144, 146, 150, 154, 156, 159, 166, 169, 174, 176, 178, 180, 183 , 186, 188, 191, 193, 195, 198, 200, 203, 206, 209, 212, 214, 217, 219, 224, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245 , 247, 249, 253, 255, 257, 259, 263, 265, 267, 269, 271, 273, Petition 870190069500, of 7/22/2019, p. 108/217 100/182 277, 279, 281, 287, 291, 293, 301, 304, 307, 313, 315, 319, 322, 324, 330, 332, 334, 341, 343, 345, 347, 349, 352, 354, 356, 358, 367, 369, 371, 378, 382, 385, 387, 389, 391, 395, 397, 399, 401, 403, 405, 409, 411, 415, 417, 421, 425, 428, 432, 435, 438, 440, 443, 445, 447, 453, 455, 457, 461, 464, 467, 469, 471, 475, 477, 482, 484, 487, 489, 491, 494, 496, 499, 502, 505, 507, 511, 513, 518, 520, 524, 526, 528, 530, 532, 536, 538, 540, 542, 544, 546, 548, 550, 553, 555, 558, 560, 561, 563, 566, 569, 571, 577, 579, 581, 583, 585, 587, 591, 593, 595, 597, 600, 604, 606, 610, 612, 614, 616, 618, 620, 623, 626, 628, 631, 633, 635, 637, 639, 640, 642, 644, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 678, 681, 683, 685, 693, 696, 698, 700, 703, 705, 707, 709, 711, 714, 717, 719, 722, 725, 728, 731, 734, 738, 741, 743, 745, 752, 754, 756, 759, 762, 770, 773, 779, 784, 787, 790, 792, 794, 797, 804, 806, 809, and 822, as described in more detail below. A nucleic acid can also be a fragment that is at least 40% (for example, at least 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or 99%) of the length of the nucleic acid complete presented in SEQ ID NOs: 1, 3, 5, 9, 11, 13, 16, 23, 25, 27, 31, 33, 35, 38, 40, 42, 44, 46, 48, 51, 54, 56 , 58, 60, 62, 64, 67, 69, 74, 76, 80, 83, 85, 87, 89, 91, 93, 95, 98, 103, 106, 108, 110, 112, 114, 116, 119 , 121, 123, 125, 127, 129, 131, 134, 137, 140, 142, 144, 146, 150, 154, 156, 159, 166, 169, 174, 176, 178, 180, 183, 186, 188 , 191, 193, 195, 198, 200, 203, 206, 209, 212, 214, 217, 219, 224, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249 , 253, 255, 257, 259, 263, 265, 267, 269, 271, 273, 277, 279, 281, 287, 291, 293, 301, 304, 307, 313, 315, 319, 322, Petition 870190069500, of 07/22/2019, p. 109/217 101/182 324, 330, 332, 334, 341, 343, 345, 347, 349, 352, 354, 356, 358, 367, 369, 371, 378, 382, 385, 387, 389, 391, 395, 397, 399, 401, 403, 405, 409, 411, 415, 417, 421, 425, 428, 432, 435, 438, 440, 443, 445, 447, 453, 455, 457, 461, 464, 467, 469, 471, 475, 477, 482, 484, 487, 489, 491, 494, 496, 499, 502, 505, 507, 511, 513, 518, 520, 524, 526, 528, 530, 532, 536, 538, 540, 542, 544, 546, 548, 550, 553, 555, 558, 560, 561, 563, 566, 569, 571, 577, 579, 581, 583, 585, 587, 591, 593, 595, 597, 600, 604, 606, 610, 612, 614, 616, 618, 620, 623, 626, 628, 631, 633, 635, 637, 639, 640, 642, 644, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 678, 681, 683, 685, 693, 696, 698, 700, 703, 705, 707, 709, 711, 714, 717, 719, 722, 725, 728, 731, 734, 738, 741, 743, 745, 752, 754, 756, 759, 762, 770, 773, 779, 784, 787, 790, 792, 794, 797, 804, 806, 809, and 822.
[0150] A biomass composition modulating nucleic acid may comprise the nucleotide sequence shown in SEQ ID NO: 482. Alternatively, a biomass composition modulating nucleic acid may be a variant of the nucleic acid with the nucleotide sequence shown in SEQ ID NO. : 482. For example, a biomass composition modulating nucleic acid may have a nucleotide sequence with a sequence identity of at least 80%, for example, 81%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to the nucleotide sequence shown in SEQ ID NO: 482.
[0151] A biomass composition modulating nucleic acid may comprise the nucleotide sequence shown in SEQ ID NO: 561. Alternatively, a biomass composition modulating nucleic acid may be a variant of the nucleic acid with the nucleotide sequence shown in SEQ ID NO. : 561. For example, a biomass composition modulating nucleic acid may have a nucleotide sequence with a sequence identity of at least 80%, for example, 81%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to the nucleotide sequence shown in SEQ ID NO: 561.
[0152] A biomass composition modulating nucleic acid may comprise the nucleotide sequence shown in SEQ ID NO: 245. Alternatively, a biomass composition modulating nucleic acid may be a variant of the nucleic acid with the nucleotide sequence shown in SEQ ID NO. : 245. For example, a biomass composition modulating nucleic acid may have a nucleotide sequence with a sequence identity of at least 80%, for example, 81%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to the nucleotide sequence shown in SEQ ID NO: 245.
[0153] A biomass composition modulating nucleic acid may comprise the nucleotide sequence shown in SEQ ID NO: 110. Alternatively, a biomass composition modulating nucleic acid may be a variant of the nucleic acid with the nucleotide sequence shown in SEQ ID NO. : 110. For example, a biomass composition modulating nucleic acid can have a nucleotide sequence with a sequence identity of at least 80%, for example, 81%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to the nucleotide sequence shown in SEQ ID NO: 110.
[0154] A biomass composition modulating nucleic acid may comprise the nucleotide sequence shown in SEQ ID NO: 347. Alternatively, a biomass composition modulating nucleic acid may be a variant of the nucleic acid with the nucleotide sequence shown in SEQ ID NO. : 347. For example, a biomass composition modulating nucleic acid can have a nucleotide sequence with a sequence identity of at least 80%, for example, 81%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to the nucleotide sequence shown in SEQ ID NO: 347.
[0155] A biomass composition modulating nucleic acid may comprise the nucleotide sequence shown in SEQ ID NO: 773. Alternatively, a biomass composition modulating nucleic acid may be a variant of the nucleic acid with the nucleotide sequence shown in SEQ ID NO. : 773. For example, a biomass composition modulating nucleic acid can have a nucleotide sequence with a sequence identity of at least 80%, for example, 81%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to the nucleotide sequence shown in SEQ ID NO: 773.
[0156] A biomass composition modulating nucleic acid may comprise the nucleotide sequence shown in SEQ ID NO: 415. Alternatively, a biomass composition modulating nucleic acid may be a variant of the nucleic acid with the nucleotide sequence shown in SEQ ID NO. : 415. For example, a biomass composition modulating nucleic acid may have a nucleotide sequence with a sequence identity of at least 80%, for example, 81%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, nucleotide sequence shown in SEQ ID NO: 415.
[0157] A biomass composition modulating nucleic acid may comprise the nucleotide sequence shown in SEQ ID NO: 1. Alternatively, a biomass composition modulating nucleic acid may be a variant of the nucleic acid with the nucleotide sequence shown in SEQ ID NO. : 1. For example, a biomass composition modulating nucleic acid can have a nucleotide sequence with a sequence identity of at least 80%, for example, 81%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to the nucleotide sequence shown in SEQ ID NO: 1.
[0158] A biomass composition modulating nucleic acid may comprise the nucleotide sequence shown in SEQ ID NO: 156. Alternatively, a biomass composition modulating nucleic acid may be a variant of the nucleic acid with the nucleotide sequence shown in SEQ ID NO. : 156. For example, a biomass composition modulating nucleic acid can have a nucleotide sequence with a sequence identity of at least 80%, for example, 81%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to the nucleotide sequence shown in SEQ ID NO: 156.
[0159] A biomass composition modulating nucleic acid may comprise the nucleotide sequence shown in SEQ ID NO: 279. Alternatively, a biomass composition modulating nucleic acid may be a variant of the nucleic acid with the nucleotide sequence shown in SEQ ID NO. : 279. For example, a biomass composition modulating nucleic acid can have a nucleotide sequence with a sequence identity of at least 80%, for example, 81%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to the nucleotide sequence shown in SEQ ID NO: 279.
[0160] A biomass composition modulating nucleic acid may comprise the nucleotide sequence shown in SEQ ID NO: 640. Alternatively, a biomass composition modulating nucleic acid may be a variant of the nucleic acid with the nucleotide sequence shown in SEQ ID NO. : 640. For example, a biomass composition modulating nucleic acid can have a nucleotide sequence with a sequence identity of at least 80%, for example, 81%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to the nucleotide sequence shown in SEQ ID NO: 640.
[0161] A biomass composition modulating nucleic acid may comprise the nucleotide sequence shown in SEQ ID NO: 25. Alternatively, a biomass composition modulating nucleic acid may be a variant of the nucleic acid with the nucleotide sequence shown in SEQ ID NO. : 25. For example, a biomass composition modulating nucleic acid can have a nucleotide sequence with a sequence identity of at least 80%, for example, 81%, 85%, 90%, 95%, 97%, 98%, or 99% sequential identity, in relation to the nucleotide sequence shown in SEQ ID NO: 25.
[0162] Isolated nucleic acid molecules can be produced by standard techniques. For example, polymerase chain reaction (PCR) techniques can be used to obtain an isolated nucleic acid containing a nucleotide sequence described here. PCR can be used to amplify specific DNA as well as RNA sequences, including sequences of total genomic DNA or total cellular RNA. Various PCR methods are described, for example, in "PCR Primer: A Laboratory Manual, Dieffenbach and Dveksler, eds., Cold Spring Harbor Laboratory Press, 1995". In general, sequential information from the ends of the region of interest or beyond is used to design oligonucleotide primers that are identical or similar in sequence to the opposite strands of the model to be amplified. Several PCR strategies are also available, whereby site-specific modifications can be made to nucleotide sequences in a model nucleic acid. Isolated nucleic acids can also be synthesized chemically, either as an individual nucleic acid molecule (for example, using automated DNA synthesis in the 3 'to 5' direction with phosphoramidite technology) or as a series of oligonucleotides. For example, one or more pairs of long oligonucleotides (for example,> 100 nucleotides) containing the desired sequence can be synthesized, with each pair containing a short complementary segment (for example, about 15 nucleotides) so that a double formed when the oligonucleotide pair is annealed. DNA polymerase is used to extend the nucleotides, resulting in a single nucleic acid molecule with two strands, which can then be linked to a vector. Isolated nucleic acids of the invention can also be obtained by mutagenesis of, for example, naturally occurring DNA. B. Use of nucleic acids to modulate polypeptide expression i. Expression of a polypeptide modulator of biomass composition
[0163] A nucleic acid encoding one or more of the biomass composition modulating polypeptides described herein can be used to express the polypeptide in a plant species of interest, typically by transforming a plant cell with a nucleic acid having the coding sequence for the polypeptide operationally linked in direct orientation to one or more regulatory regions. It will be noted that, because of the degeneracy of the genetic code, several nucleic acids can encode a particular biomass composition modulator polypeptide; that is, for several amino acids, there is more than one nucleotide triplet that serves as a codon for the amino acid. In this way, the codons in the coding sequence for a given polypeptide modulator of biomass composition can be modified so that the optimal expression in a particular plant species is obtained, using appropriate codon configuration tables for that species.
[0164] In some cases, the expression of a biomass composition modulator polypeptide inhibits one or more functions of an endogenous polypeptide. For example, a nucleic acid that encodes a dominant negative polypeptide can be used to inhibit protein function. A dominant negative polypeptide is typically mutated or truncated with respect to an endogenous wild type polypeptide, and its presence in a cell inhibits one or more functions of the wild type polypeptide in that cell, that is, the dominant negative polypeptide is genetically dominant and confers a loss of function. The mechanism by which a dominant negative polypeptide confers such a phenotype can vary, but it usually involves a protein-protein interaction or a protein-DNA interaction. For example, a dominant negative polypeptide may be an enzyme that is truncated with respect to a native wild-type enzyme, so that the truncated polypeptide maintains domains involved in binding to a first protein, but does not have domains involved in binding to a second protein. protein. The truncated polypeptide is therefore unable to correctly modulate the activity of the second protein. See, for example, US 2007/0056058. As another example, a point mutation that results in a non-conservative amino acid substitution in a catalytic domain can result in a dominant negative polypeptide. See, for example, US 2005/032221. As another example, a dominant negative polypeptide may be a transcription factor that is truncated in relation to a native wild-type transcription factor, so that the truncated polypeptide maintains the DNA-binding domain (s) but does not have the activation domain (s). Such a truncated polypeptide can inhibit the wild-type DNA-binding transcription factor, thereby inhibiting the activation of transcription. ii. Inhibition of Expression of a polypeptide modulator of biomass composition
[0165] The polynucleotides and recombinant structures described here can be used to inhibit the expression of a polypeptide modulating biomass composition in a plant species of interest. See, for example, Matzke and Birchler, Nature Reviews Genetics 6: 24-35 (2005); Akashi et al., Nature Reviews Mol. Cell Biology 6: 413-422 (2005); Mittal, Nature Reviews Genetics 5: 355-365 (2004); and Nature Reviews RNA interference collection, October 2005 on the internet at nature.com/reviews/focus/mai. Various methods based on nucleic acids, including RNA in reverse, ribosome-directed RNA cleavage, post-transcriptional gene silencing (PTGS), for example, RNA interference (RNAi), and transcriptional gene silencing (TGS) are known for inhibiting gene expression in plants. Suitable polynucleotides include full-length nucleic acids that encode polypeptides modulating biomass composition or fragments of such full-length nucleic acids. In some embodiments, a full length nucleic acid complement or fragment thereof can be used. Typically, a fragment is at least 10 nucleotides, for example, at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 30, 35 , 40, 50, 80, 100, 200, 500 nucleotides or more. In general, greater homology can be used to compensate for the use of a shorter string.
[0166] Technology in reverse is a well-known method. In this method, the nucleic acid of a gene to be repressed is cloned and operationally linked to a regulatory region and to a transcription termination sequence so that the reverse RNA strand is transcribed. The recombinant structure is then transformed into plants, as described here, and the reverse RNA strand is produced. The nucleic acid need not be the entire sequence of the gene to be repressed, but it is typically substantially complementary to at least a portion of the direct strand of the gene to be repressed.
[0167] In another method, a nucleic acid can be transcribed into a ribozyme, or catalytic RNA, that affects the expression of an mRNA. See U.S. Patent No. 6,423,885. Ribozymes can be designed to pair specifically with virtually any target RNA and cleave the phosphodiester chain at a specific location, thereby functionally inactivating the target RNA. Heterologous nucleic acids can encode ribozymes designed to cleave particular mRNA transcripts, thereby preventing the expression of a polypeptide. “Hammerhead” ribozymes are useful for destroying particular mRNAs, although several ribozymes that cleave mRNA in site-specific recognition sequences can be used. Hammerhead ribozymes cleave mRNAs at locations dictated by flanking regions that form complementary base pairs with the target mRNA. The only requirement is that the mRNA contains a 5'-UG-3 'nucleotide sequence. The construction and production of “hammerhead” ribozymes is known in the art. See, for example, U.S. Patent No. 5,254,678 and WO 02/46449 and references cited here. Sequences of "hammerhead" ribozymes can be contained in a stable RNA as a transfer RNA (tRNA) to improve cleavage efficiency in vivo. Perriman et al., Proc. Natl. Acad. Sci. USA, 92 (13): 6175-6179 (1995); by Feyter and Gaudron, Methods in Molecular Biology, Vol. 74, Chapter 43, "Expressing Ribozymes in Plants", edited by Turner, P.C., Humana Press Inc., Totowa, NJ. Endoribonuclease RNAs that have been described, such as that which occurs naturally in Tetrahymena thermophila, may be useful. See, for example, US Patent Nos. 4,987,071 and 6,423,885.
[0168] PTGS, for example, RNAi, can also be used to inhibit the expression of a gene. For example, a structure can be prepared that includes a sequence that is transcribed into an RNA that can form a ring with itself, for example, a double-stranded RNA with a stem-loop structure. In some embodiments, a strand of the stem portion of a double stranded RNA comprises a sequence that is similar or identical to the direct coding sequence or a fragment thereof of a biomass composition modulating polypeptide, and which has about 10 nucleotides to about 2500 nucleotides in length. The length of the sequence that is similar or identical to the direct coding sequence can be 10 nucleotides to 500 nucleotides, 15 nucleotides to 300 nucleotides, or 20 nucleotides to 100 nucleotides, or 25 nucleotides to 100 nucleotides. The other strand of the stem portion of a double stranded RNA comprises a sequence that is similar or identical to the reverse strand, or a fragment thereof, of a sequence encoding the biomass composition modulating polypeptide, and may be of a shorter, equal length , or greater than the corresponding length of the direct sequence. In some cases, a double-stranded RNA stem strand comprises a sequence that is similar or identical to the 3 'or 5' untranslated region, or a fragment thereof, of an mRNA encoding a biomass composition modulator polypeptide, and the other strand of the stem portion of the double stranded RNA comprises a sequence that is similar or identical to the sequence that is complementary to the 3 'or 5' untranslated region, respectively, or a fragment thereof, of the mRNA encoding the biomass composition. In other embodiments, a strand of the stem portion of a double stranded RNA comprises a sequence that is similar or identical to an intron sequence, or a fragment thereof, in the pre-mRNA encoding a biomass composition modulator polypeptide, and the another strand of the stem portion comprises a sequence that is similar or identical to the sequence that is complementary to the intron sequence, or a fragment thereof, in the pre-mRNA.
[0169] The loop portion of a double-stranded RNA can be 3 nucleotides to 5000 nucleotides, for example, 3 nucleotides to 25 nucleotides, 15 nucleotides to 1000 nucleotides, 20 nucleotides to 500 nucleotides, or 25 nucleotides to 200 nucleotides. The loop portion of the RNA can include an intron or a fragment thereof. A double-stranded RNA can have zero, one, two, three, four, five, six, seven, eight, nine, ten, or more rod-loop structures.
[0170] A structure including a sequence that is operationally linked to a regulatory region and a transcription termination sequence, and that is transcribed into an RNA that can form a double-stranded RNA, is transformed into plants as described here. Methods for using RNAi to inhibit expression of a gene are known to those skilled in the art. See, for example, US Patent Nos. 5,034,323; 6,326,527; 6,452,067; 6,573,099; 6,753,139; and 6,777,588. See also WO 97/01952; WO 98/53083; WO 99/32619; WO 98/36083; and U.S. Patent Publications 20030175965, 20030175783, 20040214330, and 20030180945.
[0171] Structures containing regulatory regions operationally linked to nucleic acid molecules in direct orientation can also be used to inhibit the expression of a gene. The transcription product may be similar or identical to the direct coding sequence, or a fragment thereof, of a polypeptide modulator of biomass composition. The transcription product may also not be polyadenylated, have a 5 'capping structure, or contain an indivisible intron. Methods for inhibiting gene expression using a full-length cDNA, as well as a partial cDNA sequence, are known in the art. See, for example, U.S. Patent No. 5,231,020.
[0172] In some embodiments, a structure containing a nucleic acid with at least one strand, which is a template for both forward and reverse sequences that are complementary to each other, is used to inhibit the expression of a gene. The forward and reverse sequences can be part of a larger nucleic acid molecule or they can be part of separate nucleic acid molecules with sequences that are not complementary. Direct or reverse sequences can be sequences that are identical or complementary to the sequence of an mRNA, to the 3 'or 5' untranslated regions of an mRNA, or to an intron in a pre-RNA encoding a biomass composition modulator polypeptide , or a fragment of such sequences. In some embodiments, the forward or reverse sequences are identical or complementary to a sequence in the regulatory region that directs the transcription of the gene that encodes a polypeptide modulator of biomass composition. In each case, the forward sequence is the sequence that is complementary to the reverse sequence.
[0173] The forward and reverse sequences can be longer than about 10 nucleotides (for example, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides). For example, a reverse sequence can be 21 or 22 nucleotides in length. Typically, the forward and reverse sequences are between about 15 nucleotides and about 30 nucleotides in length, for example, from about 18 nucleotides to about 28 nucleotides, or from about 21 nucleotides to about 25 nucleotides.
[0174] In some embodiments, a reverse sequence is a sequence complementary to an mRNA sequence, or a fragment thereof, that encodes a biomass composition modulator polypeptide described here. The forward sequence complementary to the reverse sequence can be a sequence present in the biomass composition modulator polypeptide mRNA. Typically, forward and reverse sequences are designed to correspond to a 15-30 nucleotide sequence of a target mRNA so that the level of that target mRNA is reduced.
[0175] In some embodiments, a structure containing a nucleic acid with at least one strand that is a template for more than one direct sequence (for example, one or more direct sequences) can be used to inhibit the expression of a gene. Likewise, a structure containing a nucleic acid with at least one (for example, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more sequences) can be used to inhibit expression of a gene . For example, a structure can contain a nucleic acid with at least one strand that is a template for two forward and two reverse sequences. The multiple straight strings can be identical or different, and the multiple reverse strings can be identical or different. For example, a structure may have a stranded nucleic acid that is a template for two identical forward streams and two identical reverse streams that are complementary to the two identical forward streams. Alternatively, an isolated nucleic acid can have a strand that is a template for (1) two identical forward sequences with 20 nucleotides in length, (2) a reverse sequence that is complementary to the two identical forward sequences with 20 nucleotides in length, (3 ) a direct sequence with 30 nucleotides in length, and (4) three identical reverse sequences that are complementary to the direct sequence with 30 nucleotides in length. The structures provided here can be designed to have an adequate organization of forward and reverse sequences. For example, two identical straight strings can be followed by two identical reverse strings or can be positioned between two identical reverse strings.
[0176] A nucleic acid with at least one strand that is a template for one or more forward and / or reverse sequences can be operationally linked to a regulatory region to direct the transcription of an RNA molecule containing the sequence (s) ) direct (s) and / or reverse (s). In addition, such a nucleic acid may be operably linked to a transcription termination sequence, such as the termination of the nopaline synthase (nos) gene. In some cases, two regulatory regions can direct the transcription of two transcripts: one from the upper strand, and one from the lower strand. See, for example, Yan et al., Plant Physiol., 141: 1508-1518 (2006). The two regulatory regions can be the same or different. The two transcripts can form double-stranded RNA molecules that induce degradation of the target RNA. In some cases, a nucleic acid can be positioned inside a TDNA or a plant-derived transfer DNA (PDNA) so that the right and left borderline sequences of the T-RNA or the right and right borderline sequences the left side of the P-RNA are flanking, or are on both sides, of the nucleic acid. See US 2006/0265788. The nucleic acid sequence between the two regulatory regions can be about 15 to about 300 nucleotides in length. In some embodiments, the nucleic acid sequence between the two regulatory regions is from about 15 to about 200 nucleotides in length, from about 15 to about 100 nucleotides in length, from about 15 to about 50 nucleotides in length , from about 18 to about 50 nucleotides in length, from about 18 to about 40 nucleotides in length, from about 18 to about 30 Petition 870190069500, 07/22/2019, p. 125/217 117/182 nucleotides in length, or from about 18 to about 25 nucleotides in length.
[0177] In some nucleic acid-based methods for inhibiting gene expression in plants, a suitable nucleic acid can be a nucleic acid analogue. Nucleic acid analogs can be modified in the base group, in the sugar group, or in the phosphate chain to improve, for example, the stability, hybridization, or solubility of the nucleic acid. Modifications in the base group include deoxyuridine for deoxythymidine, and 5-methyl-2'-deoxycytidine and 5-bromo-2'-deoxycytidine for deoxycytidine. Modifications to the sugar group include the modification of the 2 'hydroxyl of the ribose sugar to form 2'-O-methyl or 2'-O-ally sugars. The phosphate chain of deoxyribose can be modified to produce morphological nucleic acids, in which each base group is linked to a six-membered morpholine ring, or peptide nucleic acids, in which the deoxyphosphate chain is replaced by a pseudopeptide chain and the four bases are retained. See, for example, Summerton and Weller, Antisense Nucleic Acid Drug Dev., 7: 187-195 (1997); Hyrup et al., Bioorgan. Med. Chem., 4: 5-23 (1996). In addition, the deoxyphosphate chain can be replaced by, for example, a phosphorothioate or phosphorodithioate chain, a phosphoramidite chain, or alkyl phosphotriester. C. Structures / Vectors
[0178] The recombinant structures provided here can be used to transform plants or plant cells in order to modulate biomass levels. A structure of acids Petition 870190069500, of 7/22/2019, p. 126/217 118/182 recombinant nucleic acids may comprise a nucleic acid encoding a biomass composition modulator polypeptide as described herein, operably linked to a regulatory region suitable for expressing the biomass composition modulator polypeptide in the plant or cell. In this way, a nucleic acid can comprise a coding sequence that encodes a polypeptide modulator of biomass composition as shown in SEQ ID NOs: 2, 4, 6, 7, 8, 10, 12, 14, 15, 17, 18, 19, 20, 21, 22, 24, 26, 28, 29, 30, 32, 34, 36, 37, 39, 41, 43, 45, 47, 49, 50, 52, 53, 55, 57, 59, 61, 63, 65, 66, 68, 70, 71, 72, 73, 75, 77, 78, 79, 81, 82, 84, 86, 88, 90, 92, 94, 96, 97, 99, 100, 101, 102, 104, 105, 107, 109, 111, 113, 115, 117, 118, 120, 122, 124, 126, 128, 130, 132, 133, 135, 136, 138, 139, 141, 143, 145, 147, 148, 149, 151, 152, 153, 155, 157, 158, 160, 161, 162, 163, 164, 165, 167, 168, 170, 171, 172, 173, 175, 177, 179, 181, 182, 184, 185, 187, 189, 190, 192, 194, 196, 197, 199, 201, 202, 204, 205, 207, 208, 210, 211, 213, 215, 216, 218, 220, 221, 222, 223, 225, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 251, 252, 254, 256, 258, 260, 261, 262, 264, 266, 268, 270, 272, 274, 275, 276, 278, 280, 282, 283, 284, 285, 286, 288, 289 , 290, 292, 294, 295, 296, 297, 298, 299, 300, 302, 303, 305, 306, 308, 309, 310, 311, 312, 314, 316, 317, 318, 320, 321, 323 , 325, 326, 327, 328, 329, 331, 333, 335, 336, 337, 338, 339, 340, 342, 344, 346, 348, 350, 351, 353, 355, 357, 359, 360, 361 , 362, 363, 364, 365, 366, 368, 370, 372, 373, 374, 375, 376, 377, 379, 380, 381, 383, 384, 386, 388, 390, 392, 393, 394, 396 , 398, 400, 402, 404, 406, 407, 408, 410, 412, 413, 414, 416, 418, Petition 870190069500, of 7/22/2019, p. 127/217 119/182 419, 420, 422, 423, 424, 426, 427, 429, 430, 431, 433, 434, 436, 437, 439, 441, 442, 444, 446, 448, 449, 450, 451, 452, 454, 456, 458, 459, 460, 462, 463, 465, 466, 468, 470, 472, 473, 474, 476, 478, 479, 480, 481, 483, 485, 486, 488, 490, 492, 493, 495, 497, 498, 500, 501, 503, 504, 506, 508, 509, 510, 512, 514, 515, 516, 517, 519, 521, 522, 523, 525, 527, 529, 531, 533, 534, 535, 537, 539, 541, 543, 545, 547, 549, 551, 552, 554, 556, 557, 559, 562, 564, 565, 567, 568, 570, 572, 573, 574, 575, 576, 578, 580, 582, 584, 586, 588, 589, 590, 592, 594, 596, 598, 599, 601, 602, 603, 605, 607, 608, 609, 611, 613, 615, 617, 619, 621, 622, 624, 625, 627, 629, 630, 632, 634, 636, 638, 641, 643, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 677, 679, 680, 682, 684, 686, 687, 688, 689, 690, 691, 692, 694, 695, 697, 699, 701, 702, 704, 706, 708, 710, 712, 713, 715, 716, 718, 720, 721, 723, 724, 726, 727, 729, 730, 732, 73 3, 735, 736, 737, 739, 740, 742, 744, 746, 747, 748, 749, 750, 751, 753, 755, 757, 758, 760, 761, 763, 764, 765, 766, 767, 768, 769, 771, 772, 774, 775, 776, 777, 778, 780, 781, 782, 783, 785, 786, 788, 789, 791, 793, 795, 796, 798, 799, 800, 801, 802, 803, 805, 807, 808, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, and 823. Examples of nucleic acids encoding biomass composition modulating polypeptides are presented in SEQ ID NOs: 1, 3, 5, 9, 11, 13, 16, 23, 25, 27, 31, 33, 35, 38, 40, 42, 44, 46, 48, 51, 54, 56, 58, 60, 62, 64, 67, 69, 74, 76, 80, 83, 85, 87, 89, 91, 93, 95, 98, 103, 106, 108, 110, 112, 114, 116, 119, 121, 123, 125, 127, 129, 131, 134, 137, 140, 142, 144, 146, 150, 154, 156, 159, 166, 169, 174, 176, Petition 870190069500, of 7/22/2019, p. 128/217 120/182 178, 180, 183, 186, 188, 191, 193, 195, 198, 200, 203, 206, 209, 212, 214, 217, 219, 224, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 253, 255, 257, 259, 263, 265, 267, 269, 271, 273, 277, 279, 281, 287, 291, 293, 301, 304, 307, 313, 315, 319, 322, 324, 330, 332, 334, 341, 343, 345, 347, 349, 352, 354, 356, 358, 367, 369, 371, 378, 382, 385, 387, 389, 391, 395, 397, 399, 401, 403, 405, 409, 411, 415, 417, 421, 425, 428, 432, 435, 438, 440, 443, 445, 447, 453, 455, 457, 461, 464, 467, 469, 471, 475, 477, 482, 484, 487, 489, 491, 494, 496, 499, 502, 505, 507, 511, 513, 518, 520, 524, 526, 528, 530, 532, 536, 538, 540, 542, 544, 546, 548, 550, 553, 555, 558, 560, 561, 563, 566, 569, 571, 577, 579, 581, 583, 585, 587, 591, 593, 595, 597, 600, 604, 606, 610, 612, 614, 616, 618, 620, 623, 626, 628, 631, 633, 635, 637, 639, 640, 642, 644, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 678, 681, 683, 685, 693, 696, 698, 700, 703, 705, 707, 709, 711, 71 4, 717, 719, 722, 725, 728, 731, 734, 738, 741, 743, 745, 752, 754, 756, 759, 762, 770, 773, 779, 784, 787, 790, 792, 794, 797, 804, 806, 809, and 822, or in the Sequential Listing. The biomass composition modulator polypeptide encoded by a recombinant nucleic acid can be a native biomass composition modulator polypeptide, or it can be heterologous to the cell. In some cases, the recombinant structure contains a nucleic acid that inhibits the expression of a polypeptide modulator of biomass composition, operationally linked to a regulatory region. Examples of suitable regulatory regions are described in the section entitled “Regulatory regions”.
[0179] Vectors containing recombinant nucleic acid structures such as those described herein are also provided. Suitable vector chains include, for example, those routinely used in the art, such as plasmids, viruses, artificial chromosomes, BACs, YACs, or PACs. Suitable expression vectors include, without limitation, pasmids and viral vectors derived from, for example, bacteriophages, baculovirus, and retrovirus. Various vectors and expression systems are commercially available from companies such as Novagen® (Madison, WI), Clontech® (Palo Alto, CA), Stratagene® (La Jolla, CA), and Invitrogen / Life Technologies® (Carlsbad, CA).
[0180] The vectors provided here may also include, for example, origins of replication, support coupling regions (SARs), and / or markers. A marker gene can confer a selectable phenotype in a plant cell. For example, a marker can confer resistance to biocides, such as resistance to an antibiotic (for example, kanamycin, G418, bleomycin, or hygromycin), or to a herbicide (for example, glyphosate, chlorosulfurone or phosphinothricin). In addition, an expression vector can include a marker sequence designed to facilitate manipulation or detection (for example, purification or localization) of the expressed polypeptide. Marker sequences, such as luciferase, β-glucoronidase (GUS), green fluorescent protein (GFP), glutathione S-transferase (GST), polyhistidine, c-myc, hemagglutinin, or Flag ™ (Kodak, New Haven, CT ), are typically expressed as a fusion with the encoded polypeptide. Such markers can be inserted anywhere in the polypeptide, including the carboxy-terminal or the amino-terminal. D. Regulatory regions
[0181] The choice of regulatory regions to be included in a recombinant structure depends on several factors, including, but not limited to, efficiency, selectability, inducibility, desired level of expression, and expression with cellular or tissue preference. It is a main routine for someone skilled in the art to modulate the expression of a coding sequence by the appropriate selection and positioning of regulatory regions in relation to the coding sequence. The transcription of a nucleic acid can be modulated in a similar way.
[0182] Some suitable regulatory regions initiate transcription only, or predominantly, in some cell types. Methods for the identification and characterization of regulatory regions in plant genomic DNA are known, including, for example, those described in the following references: Jordano et al., Plant Cell, 1: 855-866 (1989); Bustos et al., Plant Cell, 1: 839-854 (1989); Green et al., EMBO J., 7: 4035-4044 (1988); Meier et al., Plant Cell, 3: 309-316 (1991); and Zhang et al., Plant Physiology, 110: 1069-1079 (1996).
[0183] Examples of various classes of regulatory regions are described below. Some of the regulatory regions indicated below, as well as additional regulatory regions, are described in more detail in Petition Patent Applications 870190069500, of 7/22/2019, p. 131/217 123/182 USA Nos. 60 / 505,689; 60 / 518,075; 60 / 544,771; 60 / 558,869; 60 / 583,691; 60 / 619,181; 60 / 637,140; 60 / 757,544; 60 / 776,307; 10 / 957,569; 11 / 058,689; 11 / 172,703; 11 / 208,308; 11 / 274,890; 60 / 583,609; 60 / 612,891; 11 / 097,589; 11 / 233,726; 11 / 408,791; 11 / 414,142; 10 / 950,321; 11 / 360,017; PCT / US05 / 011105; PCT / US05 / 23639; PCT / US05 / 034308; PCT / US05 / 034343; and PCT / US06 / 038236; PCT / US06 / 040572; PCT / US07 / 62762; PCT / US2009 / 032485; and PCT / US2009 / 038792.
[0184] For example, the regulatory region strings p326, YP0144, YP0190, p13879, YP0050, p32449, 21876, YP0158, YP0214, YP0380, PT0848, PT0633, YP0128, YP0275, PT0660, PT0683, PT0758, PT0688, PT088 , PT0837, YP0092, PT0676, PT0708, YP0396, YP0007, YP0111, YP0103, YP0028, YP0121, YP0008, YP0039, YP0115, YP0119, YP0120, YP0374, YP0107, Y, P0101, YP0101, YP0101, YP010 , YP0088, YP0143, YP0156, PT0650, PT0695, PT0723, PT0838, PT0879, PT0740, PT0535, PT0668, PT0886, PT0585, YP0381, YP0337, PT0710, YP0356, YP0385, YP0004, YP0387, , PT0678, YP0086, YP0188, YP0263, PT0743 and YP0096 are shown in the sequential listing of PCT / US06 / 040572; the regulatory region sequence PT0625 is shown in the sequential listing of PCT / US05 / 034343; regulatory region strings PT0623, YP0388, YP0087, YP0093, YP0108, YP0022 and YP0080 are shown in the sequential listing of U.S. Patent Application No. 11 / 172,703; the regulatory region sequence PR0924 is shown in the sequential listing of PCT / US07 / 62762; and the sequences of regulatory regions p530c10, pOsFIE2-2, pOsMEA, Petition 870190069500, of 7/22/2019, p. 132/217 124/182 pOsYp102, and pOsYp285 are shown in the sequential listing of PCT / US06 / 038236.
[0185] It will be noted that a regulatory region can satisfy criteria for a classification based on its activity in one plant species, and still satisfy criteria for a different classification based on its activity in another plant species. i. Broad-expression promoters
[0186] A promoter can be said to be "broad-voiced" when he promotes transcription in many, but not necessarily all, plant tissues. For example, a broad expression promoter may promote the transcription of an operably linked sequence in one or more between the stem, the stem bridge (apex), and leaves, but it acts weakly or does not act on tissues such as roots or stems. As another example, a broad expression promoter can promote the transcription of an operably linked sequence in one or more between the stem, the stem bridge (apex), and leaves, but it acts weakly or does not act on tissues like the reproductive tissues of flowers and seeds in development. Non-limiting examples of broad expression promoters that can be included in the nucleic acid structures provided herein include the p326, YP0144, YP0190, p13879, YP0050, p32449, 21876, YP0158, YP0214, YP0380, PT0848, and PT0633 promoters. Additional examples include the cauliflower mosaic virus (CaMV) 35S promoter, the manopin synthase (MAS) promoter, the 1 'or 2' promoters derived from the T-DNA of Agrobacterium tumefaciens, the promoter of the mosaic tea virus do-rio 34S, actin promoters as the rice actin promoter, and ubiquitin promoters as the corn ubiquitin-1 promoter. In some cases, the CaMV 35S promoter is excluded from the broad-based promoter category. ii. Root promoters
[0187] Active root promoters confer transcription in the root tissue, for example, in the root endoderm, or in the root vascular tissues. In some embodiments, active root promoters are root-preferred promoters, that is, they confer transcription only or predominantly in the root tissue. Root-preferred promoters include promoters YP0128, YP0275, PT0625, PT0660, PT0683, and PT0758. Other root-preferred promoters include promoters PT0613, PT0672, PT0688, and PT0837, which target transcription primarily in the root tissue and to a lesser extent in eggs and / or seeds. Other examples of root-preferred promoters include the root-specific subdomains of the CaMV 35S promoter (Lam et al., Proc. Natl. Acad. Sci. USA, 86: 7890-7894 (1989)), the specific radial cell promoters described by Conkling et al., Plant Physiol., 93: 1203-1211 (1990), and the tobacco RD2 promoter. iii. Maturing endosperm promoters
[0188] In some modalities, promoters that target transcription in maturing endosperm may be useful. Transcription from a maturing endosperm promoter typically begins after fertilization and occurs primarily in endospheric tissue during seed development, typically reaching the highest levels during the cellization phase. The most suitable promoters are those that are predominantly active in the maturing endosperm, although promoters that are also active in other tissues can sometimes be used. Non-limiting examples of maturing endosperm promoters that may be useful in the nucleic acid structures provided herein include the napine promoter, the Arcelin-5 promoter, the phaseolin promoter (Bustos et al., Plant Cell, 1 (9) : 839-853 (1989)), the soy trypsin inhibitor promoter (Riggs et al., Plant Cell, 1 (6): 609-621 (1989)), the ACP promoter (Baerson et al., Plant Mol . Biol., 22 (2): 255-267 (1993)), the promoter of stearoyl-ACP desaturase (Slocombe et al., Plant Physiol., 104 (4): 167-176 (1994)), the α subunit 'from soy β-conglycinin (Chen et al., Proc. Natl. Acad. Sci. USA, 83: 8560-8564 (1986)), the oleosin promoter (Hong et al., Plant Mol. Biol., 34 (3): 549-555 (1997)), and zein promoters, such as the 15 kD zein promoter, the 16 kD zein promoter, the 19 kD zein promoter, and the 22 kD and 27 kD zein promoters. Also suitable are the rice glutelin-1 gene Osgt-1 (Zheng et al., Mol. Cell Biol., 13: 5829-5842 (1993)), the beta-amylase promoter, and the hordein promoter of barley. Other maturing endosperm promoters include the YP0092, PT0676, and PT0708 promoters. iv. Ovarian tissue promoters
[0189] Promoters that are active in ovary tissues such as the ovular wall and mesocarp may also be useful, for example, a polygalacturonidase promoter, the banana TRX promoter, the melon actin promoter, YP0396, and PT0623. Examples of promoters that are active primarily in eggs include YP0007, YP0111, YP0092, YP0103, YP0028, YP0121, YP0008, YP0039, YP0115, YP0119, YP0120, and YP0374. v. Embryonic bag / young endosperm promoters
[0190] To achieve expression in the embryonic sac / young endosperm, regulatory regions that are active in polar nuclei and / or in the central cell, or in precursors of polar nuclei, but not in egg cells or cell precursors, can be used. egg. The most suitable promoters are those that direct expression only or predominantly in polar nuclei or precursors thereof and / or in the central cell. A pattern of transcription that extends from the polar nuclei in the development of young endosperm can still be observed with preferential embryonic sac / endosperm promoters, although transcription typically decreases significantly in the next development of the endosperm during and after the cellization phase. Expression in the developing zygote or embryo is typically not present with embryonic sac / endosperm promoters.
[0191] Promoters that may be suitable include those derived from the following genes: Arabidopsis viviparous-1 (see GenBank No. U93215); Arabidopsis atmycl (see Urao, Plant Mol. Biol., 32: 571-57 (1996); Conceicao, Plant, 5: 493-505 (1994)); Arabidopsis FIE (GenBank No. AF129516); Arabidopsis MEA; Arabidopsis FIS2 (GenBank No. AF096096); and FIE 1.1 (U.S. Patent No. 6,906,244). Other promoters that may be suitable include those derived from the following genes: maize MAC1 (see Sheridan, Genetics, 142: 1009-1020 (1996)); Corn cat3 (see GenBank No. L05934; Abler, Plant Mol. Biol., 22: 10131-1038 (1993)). Other promoters that may be useful include the following Arabidopsis promoters: YP0039, YP0101, YP0102, YP0110, YP0117, YP0119, YP0137, DME, YP0285, and YP0212. Other promoters that may be useful include the following rice promoters: p530c10, pOsFIE2-2, pOsMEA, pOsYp102, and pOsYp285. vi. Embryonic promoters
[0192] Regulatory regions that preferentially direct transcription in zygotic cells after fertilization may provide embryo-preferential expression. The most suitable promoters are those that preferentially direct transcription in young embryos before the main stage, but expression in later stages and in maturing embryos is also adequate. Embryo-preferred promoters include the barley lipid transfer protein (Ltp1) promoter (Plant Cell Rep 20: 647-654 (2001)), YP0097, YP0107, YP0088, YP0143, YP0156, PT0650, PT0695, PT0723, PT0838, PT0879, and PT0740. vii.Photo-synthetic fabric promoters
[0193] Active promoters in photosynthetic tissue check transcription in green tissues such as leaves and stems. The most suitable promoters are those that direct expression only or preferentially in such tissues. Examples of these promoters include ribulose-1,5-bisphosphate carboxylase (RbcS) promoters such as the western larch RbcS promoter (Larix laricina), pine cab6 promoter (Yamamoto et al., Plant Cell Physiol., 35: 773- 778 (1994)), the Cab-1 wheat promoter (Fejes et al., Plant Mol. Biol., 15: 921-932 (1990)), the CAB-1 spinach promoter (Lubberstedt et al., Plant Physiol ., 104: 997-1006 (1994)), the cab1R rice promoter (Luan et al., Plant Cell, 4: 971-981 (1992)), the corn pyruvate orthophosphate dichinase (PPDK) promoter (Matsuoka et al., Proc. Natl. Acad. Sci. USA, 90: 9586-9590 (1993)), the tobacco promoter Lhcb1 * 2 (Cerdan et al., Plant Mol. Biol., 33: 245-255 (1997) ), the sucrose-H + carrier of Arabidopsis thaliana SUC2 (Truernit et al., Planta, 196: 564-570 (1995)), and thylakoid promoters of spinach membrane proteins (psaD, psaF, psaE, PC, FNR, atpC , atpD, cab, rbcS). Other photosynthetic tissue promoters include PT0535, PT0668, PT0886, YP0144, YP0380 and PT0585. viii.Vascular tissue promoters
[0194] Examples of promoters that have high or preferred activity in vascular assemblies include YP0087, YP0093, YP0108, YP0022, and YP0080. Other preferred promoters of vascular tissue include the glycine-rich cell wall protein GRP 1.8 promoter (Keller and Baumgartner, Plant Cell, 3 (10): 1051-1061 (1991)), the promoter of the Commelina leaf yellowing virus (CoYMV) (Medberry et al., Plant Cell, 4 (2): 185-192 (1992)), and the rice tungro bacilliform virus (RTBV) promoter (Dai et al., Proc. Natl. Acad. Sci. USA, 101 (2): 687-692 (2004)). ix. Inducible promoters
[0195] Inducible promoters confer transcription in response to external stimuli such as chemical agents or environmental stimuli. For example, inducible promoters may confer transcription in response to hormones such as gibberellic acid or ethylene, or in response to light or drought. Examples of drought-inducible promoters include YP0380, PT0848, YP0381, YP0337, PT0633, YP0374, PT0710, YP0356, YP0385, YP0396, YP0388, YP0384, PT0688, YP0286, YP0377, PD1367. Examples of nitrogen-inducible promoters include PT0863, PT0829, PT0665, and PT0886. Examples of shadow-inducible promoters include PR0924 and PT0678. An example of a salt-induced promoter is rd29A (Kasuga et al. (1999) Nature Biotech 17: 287-291). Basal promoters
[0196] A basal promoter is the minimum sequence necessary for the assembly of a transcription complex required for the start of transcription. Baseline promoters often include a "TATA Box" element that can be located between about 15 and about 35 nucleotides in front of the transcription start site. Baseline promoters can also include a “CCAAT Box” element (typically the CCAAT sequence) and / or a GGGCG sequence, which can be located between about 40 and about 200 nucleotides, typically about 60 to about 120 nucleotides, in front of the transcription start site. xi. Stem promoters
[0197] A stoma promoter can be specific to one or more tissues of the stoma or specific to the stoma and other parts of the plant. Stem promoters may have high or preferential activity in, for example, epidermis and cortex, vascular exchange, pro-exchange, or xylem. Examples of stoma promoters include YP0018 which is described in US 20060015970, and CryIA (b) and CryIA (c) (Braga et al. 2003, Journal of New Seeds 5: 209-221). xii.Other promoters
[0198] Other classes of promoters include, but are not limited to, preferred promoters for branches, preferred for stem, preferred for trichomic cells, preferred for guardian cells like PT0678, preferred for tuberous, preferred for parenchymal cells, and preferred for senescence. In some embodiments, a promoter may preferentially target expression in reproductive tissues (for example, the PO2916 promoter, SEQ ID NO: 31 at 61 / 364,903). Promoters designated as YP0086, YP0188, YP0263, PT0758, PT0743, PT0829, YP0119, and YP0096, as described in the patent applications referenced above, may also be useful. xiii.Other regulatory regions
[0199] A 5 'untranslated region (RTU) can be included in the amino acid structures described here. A 5 'RTU is transcribed, but not translated, and lies between the start site of the transcript and the translation start codon and can include the nucleotide +1. A 3 'RTU can be positioned between the translation end codon and the end of the transcript. RTUs can have particular functions such as increasing mRNA stability or attenuation of translation. Examples of 3 'RTUs include, but are not limited to, polyadenylation signals and transcription termination sequences, for example, a nopaline synthase termination sequence.
[0200] It will be noted that more than one regulatory region may be present in a recombinant nucleotide, for example, introns, enhancers, initial activation regions, transcription terminators, and inducible elements. Thus, for example, more than one regulatory region can be operationally linked to the sequence of a polynucleotide that encodes a polypeptide modulator of biomass composition.
[0201] Regulatory regions, such as promoters for endogenous genes, can be obtained by chemical synthesis or by subcloning from a genomic DNA that includes such a regulatory region. A nucleic acid comprising such a regulatory region can further include flanking sequences that contain restriction enzyme sites that facilitate subsequent manipulation. IV. Transgenic plants and plant cells The transformation
[0202] The invention also features transgenic plant cells and plants comprising at least one recombinant nucleic acid structure described here. A plant or plant cell can be transformed by having an integrated structure in its genome, that is, it can be transformed in a stable way. Stably transformed cells typically retain the introduced nucleic acid with each cell division. A plant or plant cell can still be transiently transformed so that the structure is not integrated into its genome. Transiently transformed cells typically lose all or some portion of the introduced nucleic acid structure with each cell division, so that the introduced nucleic acid cannot be detected in daughter cells after a sufficient number of cell divisions. Both transiently and stably transformed transgenic plants and plant cells can be useful in the methods described here.
[0203] Transgenic plant cells used in the methods described here can form part or all of a plant. Such plants can be grown appropriately for the species under consideration, whether in a growing chamber, a greenhouse, or in the field. Transgenic plants can be grown as desired for a particular purpose, for example, to introduce a recombinant nucleic acid into other strains, to transfer a recombinant nucleic acid to other species, or to further select other desirable traits. Alternatively, transgenic plants can be propagated vegetatively to those species that support such techniques. As used here, a "transgenic plant" also refers to the progeny of an initial transgenic plant provided that the progeny inherits the transgene. The seeds produced by a transgenic plant can be grown and then self-crossed (or crossed and self-crossed) to obtain homozygous seeds for the nucleic acid structure.
[0204] Transgenic plants can be grown in suspension culture, or tissue or organ culture. For the purposes of this invention, solid and / or liquid tissue culture techniques can be used. When solid medium is used, the transgenic plant cells can be placed directly in the medium or can be placed in a filter which is then placed in contact with the medium. When a liquid medium is used, the transgenic plant cells can be placed in a floating device, for example, a porous membrane that makes contact with the liquid medium. A solid medium can be, for example, Murashige and Skoog (MS) medium containing agar and an appropriate concentration of an auxin, for example, 2,4-dichlorophenoxyacetic acid (2,4-D), and an appropriate concentration of a cytokinin , for example, kinetin.
[0205] When transiently transformed plant cells are used, a signal sequence encoding a signal polypeptide with signal activity can be included in the transformation procedure and an assay for signal activity or expression can be performed at a suitable time after transformation. A suitable time to conduct the assay is typically about 1-21 days after transformation, for example, about 1-14 days, about 1-7 days, or about 1-3 days. The use of transient assays is particularly convenient for rapid analysis in different species, or to confirm the expression of a heterologous biomass composition modulator polypeptide whose expression has not been previously confirmed in particular recipient cells.
[0206] Techniques for introducing nucleic acids into monocotyledonous and dicotyledonous plants are known in the art, and include, without limitation, Agrobacterium-mediated transformation, viral vector-mediated transformation, electroporation and particle bombardment transformation, for example in U.S. Patents . 5,538,880; 5,204,253; 6,329,571 and 6,013,863. If a cultured cell or tissue is used as the recipient tissue for transformation, the plants can be regenerated from the transformed cultures if desired, by techniques known to those skilled in the art. B. Screening / Selection
[0207] A population of transgenic plants can be screened and / or selected for those members of the population who have a trait or phenotype conferred by the expression of the transgene. For example, a population of progenies from a single transformation event can be screened for those plants with the desired level of expression of a polypeptide modulating biomass or nucleic acid composition. Physical and biochemical methods can be used to identify levels of expression. These include meridional analysis or PCR amplification to detect a polynucleotide; northern blots, RNase S1 protection, primer extension, or RT-PCR amplification to detect RNA transcripts; enzymatic assays to detect enzymatic or ribozymatic activity of polypeptides and polynucleotides; and protein gel electrophoresis, oriental stains, immunoprecipitation, and enzyme linked immunoassays to detect polypeptides. Other techniques such as in situ hybridization, enzyme stains, and immunological stains can also be used to detect the presence or expression of polypeptides and / or polynucleotides. Methods are known to perform all of the referenced techniques. As an alternative, a plant population comprising independent transformation events can be screened for those plants with a desired trait, such as a modulated level of biomass. Selection and / or screening can be carried out in one or more generations, and / or in more than one geographical location. In some cases, transgenic plants can be grown and selected under conditions that induce a desired phenotype or are necessary to produce a desired phenotype in a transgenic plant. In addition, selection and / or screening can be applied during a particular stage of development in which the phenotype is expected to be exhibited by the plant. Selection and / or screening can be performed to choose those transgenic plants with a statistically significant difference in a biomass level compared to a control plant that does not have the transgene. Selected or screened transgenic plants have an altered phenotype when compared to a corresponding control plant, as described in the section “Phenotypes of transgenic plants” in this document. C. Plant species
[0208] The polynucleotides and vectors described here can be used to transform a number of monocot and dicot plants and plant cell systems, including species from one of the following families: Acanthaceae, Alliaceae, Alstroemeriaceae, Amaryllidaceae, Apocynaceae, Arecaceae, Asteraceae, Berberidaceae, Bixaceae, Brassicaceae, bromeliads, Cannabaceae, Caryophyllaceae, Cephalotaxaceae, Chenopodiaceae, colchicaceae, Cucurbitaceae, Dioscoreaceae, Ephedraceae, Erythroxylaceae, Euphorbiaceae, Fabaceae, Lamiaceae, Linaceae, Lycopodiaceae, Malvaceae, Melanthiaceae, Musaceae, Myrtaceae, Nyssaceae, Papaveraceae, Pinaceae, Plantaginaceae, Poaceae, Rosaceae, Rubiaceae, Salicaceae, Sapindaceae, Solanaceae, Taxaceae, Theaceae, or Vitaceae.
[0209] Suitable species may include members of the genera Abelmoschus, Abies, Acer, Agrostis, Allium, Alstroemeria, Ananas, Andrographis, Andropogon, Artemisia, Arundo, Atropa, Berberis, Beta, Bixa, Brassica, Calendula, Camellia, Camptotheca, Cannabis, Capsicum, Carthamus, Catharanthus, Cephalotaxus, Chrysanthemum, Cinchona, Citrullus, Coffea, Colchicum, Coleus, Cucumis, Cucurbita, Cynodon, Datura, Dianthus, Digitalis, Dioscorea, Elaeis, Ephedra, Erianthus, Eryroxus, Eryro Glycine, Gossypium, Helianthus, Hevea, Hordeum, Hyoscyamus, Jatropha, Lactuca, Linum, Lolium, Lupinus, Lycopersicon, Lycopodium, Manihot, Medicago, Mentha, Miscanthus, Musa, Nicotiana, Oryza, Panicum, Papaia, Papaver, Parumum, Papaia Phalaris, Phleum, Pinus, Poa, Poinsettia, Populus, Rauwolfia, Ricinus, Pink, Saccharum, Salix, Sanguinaria, Scopolia, Secale, Solanum, Sorghum, Spartina, Spinacea, Tanacetum, Taxus, Theobroma, Triticosecale, Triticum, Uniola, Veriola, UniolaVinca, Vitis, and Zea.
[0210] Suitable species include Panicum spp., Sorghum spp., Miscanthus spp., Saccharum spp., Erianthus spp., Populus spp., Andropogon gerardii (big bluestem), Pennisetum purpureum (elephant grass), Phalaris arundinacea (yellow grass) , Cynodon dactylon (bermuda grass), Festuca arundinacea (tall fescue), Spartina pectinata (prairie cord-grass), Medicago sativa (alfalfa), Arundo donax (giant grass), Secale cereale (rye), Salix spp. (willow), Eucalyptus spp. (eucalyptus), Triticosecale (triticum - wheat X rye) and bamboo.
[0211] Suitable species also include Helianthus annuus (sunflower), Carthamus tinctorius (saffron), Jatropha curcas (jatropha), Ricinus communis (castor), Elaeis guineensis (palm), Linum usitatissimum (flax), and Brassica juncea.
[0212] Suitable species also include Beta vulgaris (beet), and Manihot esculenta (cassava).
[0213] Suitable species also include Lycopersicon esculentum (tomato), Lactuca sativa (lettuce), Musa paradisiaca (banana), Solanum tuberosum (potato), Brassica oleracea (broccoli, cauliflower, Brussels sprouts), Camellia sinensis ( tea), Fragaria ananassa (strawberry), Theobroma cacao (cocoa), Coffea arabica (coffee), Vitis vinifera (grape), Ananas comosus (pineapple), Capsicum annum (hot and sweet peppers), Allium cepa (onion), Cucumis melo (melon), Cucumis sativus (cucumber), Cucurbita maxima (pumpkin), Cucurbita moschata (pumpkin), Spinacea oleracea (spinach), Citrullus lanatus (watermelon), Abelmoschus esculentus (okra), and Solanum melongena (eggplant).
[0214] Suitable species also include Papaver somniferum (poppy), Papaver orientale, Taxus baccata, Taxus brevifolia, Artemisia annua, Cannabis sativa, Camptotheca acuminate, Catharanthus roseus, Vinca rosea, Cinchona officinalis, Colchicum autumnale, Veratrum californica, Digitalis lanis purpurea, Dioscorea spp., Andrographis paniculata, Atropa belladonna, Datura stomonium, Berberis spp., Cephalotaxus spp., Ephedra sinica, Ephedra spp., Erythroxylum coca, Galanthus wornorii, Scopolia spp., Lycopodium serrat, Lycopodium serrat. , Rauwolfia serpentina, Rauwolfia spp., Sanguinaria canadensis, Hyoscyamus spp., Calendula officinalis, Chrysanthemum parthenium, Coleus forskohlii, and Tanacetum parthenium.
[0215] Suitable species also include Parthenium argentatum (guaiúle), Hevea spp. (rubber tree), Mentha spicata (mint), Mentha piperita (mint), Bixa orellana, and Alstroemeria spp.
[0216] Suitable species also include Rosa spp. (pink), Dianthus caryophyllus (carnation), Petunia spp. (petunia) and Poinsettia pulcherrima (poinsettia).
[0217] Suitable species also include Nicotiana tabacum (tobacco), Lupinus albus (lupine), Uniola paniculata (oats), bentgrass (Agrostis spp.), Populus tremuloides (aspen), Pinus spp. (pine), Abies spp. (fir), Acer spp. (maple), Hordeum vulgare (barley), Poa pratensis (blue grass), Lolium spp. (wheat) and Phleum pratense (cat's tail).
[0218] In some embodiments, a suitable species may be a wild, weedy, or cultivated species of Pennisetum such as, but not limited to, Pennisetum alopecuroides, Pennisetum arnhemicum, Pennisetum caffrum, Pennisetum clandestinum, Pennisetum divisum, Pennisetum glaucum, Pennisetum latifolium, Pennisetum macrostachyum, Pennisetum macrourum, Pennisetum orientale, Pennisetum pedicellatum, Pennisetum polystachion, Pennisetum polystachion ssp. Setosum, Pennisetum purpureum, Pennisetum setaceum, Pennisetum subangustum, Pennisetum typhoides, Pennisetum villosum, or hybrids thereof (e.g., Pennisetum purpureum x Pennisetum typhoidum).
[0219] In some embodiments, a suitable species may be a wild, weedy, or cultivated Miscanthus species and / or variety such as, but not limited to, Miscanthus x giganteus, Miscanthus sinensis, Miscanthus x ogiformis, Miscanthus floridulus, Miscanthus transmorrisonensis, Miscanthus oligostachyus, Miscanthus nepalensis, Miscanthus sacchariflorus, Miscanthus x giganteus 'Amuri', Miscanthus x giganteus 'Nagara', Miscanthus x giganteus 'Illinois', Miscanthus sinensis var. 'Goliath', Miscanthus sinensis var. 'Roland', Miscanthus sinensis var. 'Africa', Miscanthus sinensis var. 'Fern Osten', Miscanthus sinensis var. gracillimus, Miscanthus sinensis var. variegates, Miscanthus sinensis var. purpurascens, Miscanthus sinensis var. ‘Malepartus’, Miscanthus sacchariflorus var. 'Robusta', Miscanthus sinensis var. ‘Silberfedher’ (also known as Silver Feather), Miscanthus transmorrisonensis, Miscanthus condensatus, Miscanthus yakushimanum, Miscanthus var. ‘Alexander’, Miscanthus var. ‘Adagio’, Miscanthus var. 'Autumn Light', Miscanthus var. 'Cabaret', Miscanthus var. 'Condensatus', Miscanthus var. ‘Cosmopolitan’, Miscanthus var. 'Dixieland', Miscanthus var. 'Gilded Tower' (U.S. Patent No. PP14,743), Miscanthus var. 'Gold Bar' (U.S. Patent No. PP15,193), Miscanthus var. 'Gracillimus', Miscanthus var. 'Graziella', Miscanthus var. 'Grosse Fontaine', Miscanthus var. ‘Hinjo, also known as Little Nicky’ ™, Miscanthus var. 'Juli', Miscanthus var. 'Kaskade', Miscanthus var. ‘Kirk Alexander’, Miscanthus var. 'Kleine Fontaine', Miscanthus var. ‘Kleine Silberspinne’ (also known as ‘Little Silver Spider’), Miscanthus var. 'Little Kitten', Miscanthus var. 'Little Zebra' (U.S. Patent No. PP13,008), Miscanthus var. 'Lottum', Miscanthus var. 'Malepartus', Miscanthus var. 'Morning Light', Miscanthus var. 'Mysterious Maiden' (U.S. Patent No. PP16,176), Miscanthus var. 'Nippon', Miscanthus var. 'November Sunset', Miscanthus var. 'Parachute', Miscanthus var. 'Positano', Miscanthus var. ‘Puenktchen’ (also known as ‘Little Dot’), Miscanthus var. 'Rigoletto', Miscanthus var. 'Sarabande', Miscanthus var. ‘Silberpfeil’ (also known as Silver Arrow), Miscanthus var. ‘Silverstripe’, Miscanthus var. 'Super Stripe' (U.S. Patent No. PP18,161), Miscanthus var. ‘Strictus’, or Miscanthus var. ‘Zebrinus’.
[0220] In some embodiments, a suitable species may be a wild, weed, or cultivated Sorghum species and / or variety such as, but not limited to, Sorghum almum, Sorghum amplum, Sorghum angustum, Sorghum arundinaceum, Sorghum bicolor ( as bicolor, guinea, caudatum, kafir, and durra), Sorghum brachypodum, Sorghum bulbosum, Sorghum burmahicum, Sorghum controversum, Sorghum drummondii, Sorghum ecarinatum, Sorghum exstans, Sorghum grande, Sorghum halepense, Sorghum interjectum, Sorghum interjectum, Sorghum interjectum, Sorghum interjectum, Sorghum interjectum, Sorghum interjectum, Sorghum interjectum, Sorghum interjectum, Sorghum interjectum, Sorghum interjectum, Sorghum interjum leiocladum, Sorghum macrospermum, Sorghum assassankense, Sorghum miliaceum, Sorghum nigrum, Sorghum nitidum, Sorghum plumosum, Sorghum propinquum, Sorghum purpureosericeum, Sorghum stipoideum, Sorghum sudanensese, Sorghum timorense, Sorghum timorum, Sorghum trichoclad Sorghum x almum, Sorghum x sudangrass or Sorghum x drummondii.
[0221] In this way, methods and compositions can be used on a wide variety of plant species, including species of the genera of dicots Brassica, Carthamus, Glycine, Gossypium, Helianthus, Jatropha, Parthenium, Populus, and Ricinus; and the monocot genera Elaeis, Fescue, Hordeum, Lolium, Oryza, Panicum, Pennisetum, Phleum, Poa, Saccharum, Secale, Sorghum, Triticosecale, Triticum, and Zea. In some embodiments, the plant is a member of the species Panicum virgatum (grass), Sorghum bicolor (sorghum), Miscanthus giganteus (miscanto), Saccharum sp. (energetic cane), Populus balsamifera (poppy), Zea mays (corn), Glycine max (soy), Brassica napus (canola), Triticum aestivum (wheat), Gossypium hirsutum (cotton), Oryza sativa (rice), Helianthus annuus ( beet), or Pennisetum glaucum (millet).
[0222] In certain embodiments, the polynucleotides and vectors described here can be used to transform a number of monocot and dicot plants and plant cell systems, where such plants are hybrids of different species or varieties of a specific species (for example, Saccharum sp. X Miscanthus sp., Sorghum sp. X Miscanthus sp., eg, Panicum virgatum x Panicum amarum, Panicum virgatum x Panicum amarulum, and Pennisetum purpureum x Pennisetum typhoidum). D. Phenotypes of transgenic plants
[0223] In some embodiments, a plant in which the expression of a polypeptide modulator of biomass composition is modulated has higher or lower levels of sucrose, ash, or cell wall. A plant in which the expression of a polypeptide modulator of biomass composition is modulated may also have greater or lesser conversion efficiency. A component of the biomass composition can be increased by at least 2 percent, for example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 , 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, or more than 60 percent, when compared to the level of the biomass component in a corresponding control plant that does not express the transgene. In some embodiments, a plant in which the expression of a polypeptide modulator of biomass composition is modulated may have lower levels of a biomass component. The level can be decreased by at least 2 percent, for example, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, or more than 35 percent, when compared to the level in a corresponding control plant that does not express the transgene.
[0224] An increase in the amount of a biomass component (eg sucrose) in such plants may provide better nutritional availability in geographic locations where consumption of plant foods is generally insufficient, or for energy production (for example, efficiency conversion). In such embodiments, decreases in the amount of a biomass component in such plants can be useful in energy production.
[0225] In some embodiments, a plant in which the expression of a polypeptide modulator of biomass composition is modulated has greater or lesser levels of a biomass component (eg sucrose content) in one or more plant tissues, for example , vegetative tissues, reproductive tissues, or radical tissues. For example, the level of whether a biomass component can be increased by at least 2 percent, for example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 , 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, or more than 60 percent, when compared to the level in a corresponding control plant that does not express the transgene. In some embodiments, a plant in which the expression of a polypeptide modulator of biomass composition is modulated may have lower levels of a biomass component in one or more plant tissues. The level can be decreased by at least 2 percent, for example, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, or more than 35 percent, when compared to the level in a control plant correspondent that does not express the transgene.
[0226] Typically, a difference in the amount of a biomass component in a transgenic plant or cell compared to a control plant or cell is considered statistically significant at <0.05 with an appropriate parametric or nonparametric statistical test, for example, Chi-square test, Student's t test, Mann-Whitney test, or F test. In some modalities, a difference in the amount of a biomass component is statistically significant at p <0.01, p <0.005, or p <0.001 . A statistically significant difference, for example, in the amount of a biomass component in a transgenic plant when compared to the amount in a control plant indicates that the recombinant nucleic acid present in the transgenic plant results in an altered biomass composition.
[0227] The phenotype of a transgenic plant is evaluated in relation to a control plant. A plant is said to “not express” a polypeptide when the plant exhibits less than 10%, for example, less than 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.5%, 0.1%, 0.01%, or 0.001%, of the amount of the polypeptide or mRNA that encodes the polypeptide displayed by the plant of interest. Expression can be evaluated using methods that include, for example, RT-PCR, northern blots, S1 RNase protection, primer extensions, eastern blots, protein gel electrophoresis, immunoprecipitation, enzyme linked immunoassays, chip assays, and spectrometry of pasta. It should be noted that if a polypeptide is expressed under the control of a tissue-preferred or broadly expressed promoter, the expression can be evaluated in the entire plant or in a selected tissue. Similarly, if a polypeptide is expressed at a particular time, for example, at a particular stage in development or under induction, the expression can be selectively evaluated over a desired period of time.
[0228] Biomass can include harvesting plant tissues such as leaves, stems, and reproductive structures, or all plant tissues such as leaves, stems, roots, and reproductive structures. In some modalities, biomass includes only the parts of the plant on the ground. In some embodiments, biomass comprises only parts of the plant that are considered stems. In some modalities, the biomass includes only the parts of the plant on the soil, except the inflorescences and seeds of the plant. Biomass can be assessed as described in the example sections. Biomass can be quantified as the dry matter yield, which is the mass produced from biomass (usually reported in ton / acre) if the contribution of water is subtracted from the mass of fresh matter. The dry matter yield (DMY) is calculated using the mass of fresh matter (FMW) and a measure of the percentage moisture in mass (M) in the following equation. DMY = ((100-M) / 100) * FMW. Biomass can be quantified as a yield on fresh matter, which is the mass of biomass produced (usually reported in ton / acre) or as a basis of the form that is received, which includes the mass of moisture. SAW. Modification of endogenous nucleic acids encoding polypeptide modulators of biomass composition
[0229] This document also presents plant and plant cells in which an endogenous biomass composition modulating nucleic acid described here has been modified (for example, a regulatory region, intron, or coding region of the biomass composition modulating nucleic acid has been modified) . The biomass composition of such plants is changed from the corresponding composition of a control plant in which the endogenous nucleic acid has not been modified. Such plants are referred to herein as modified plants and can be used to produce, for example, larger amounts of a biomass component (for example, sucrose).
[0230] An endogenous nucleic acid can be modified by homologous recombination techniques. For example, sequence-specific endonucleases (eg, zinc finger nucleases (ZFNs)) and meganucleases can be used to stimulate homologous recombination in endogenous plant genes. See, for example, Townsend et al., Nature 459: 442-445 (2009); Tovkach et al., Plant J., 57: 747-757 (2009); and Lloyd et al., Proc. Natl. Acad. Sci. USA, 102: 2232-2237 (2005). In particular, ZFNs designed to create double-stranded DNA breaks at specific loci can be used to effect targeted sequential changes in endogenous plant genes. For example, an endogenous plant gene can be replaced by a variant containing one or more mutations (for example, produced using site-specific mutagenesis or directed evolution). In some modalities, site-specific mutagenesis is achieved by means of a final non-homologous union of form, which after DNA breakdown, endogenous DNA repair mechanisms link the breakdown, generally introducing deletions or light additions that can be screened for cell or plant level for desired phenotypes. Moore and Haber, Mol Cell Biol., 16 (5): 2164-73 (1996).
[0231] In some embodiments, endogenous nucleic acids can be modified by methylation or dimethylation so that the expression of endogenous nucleic acid is altered. For example, a double-stranded RNA can be used to activate gene expression through non-coding regulatory locating regions on genetic promoters. See Shibuya et al., Proc Natl Acad Sci USA, 106 (5): 1660-1665 (2009); and Li et al., Proc Natl Acad Sci USA, 103 (46): 17337-42 (2006). In some embodiments, ZFNs designed to create double-stranded DNA breaks at specific loci can be used to insert a DNA fragment with at least one region that overlaps with endogenous DNA to facilitate homologous recombination, so that the non-overlapping portion of the DNA fragment is integrated into the break site. For example, a fragment can be inserted into an endogenous promoter and / or regulatory region at a specific site where a ZFN has created a double strand break to alter the expression of an endogenous gene. For example, a fragment that is inserted into a coding region for an endogenous gene at a specific site where a ZFN has created a double strand break may result in the expression of a chimeric gene. For example, a fragment that functions as a regulatory or promoter region that is inserted into a region of endogenous DNA immediately before a gene coding sequence at a specific site where a ZFN has created a double strand break may result in altered expression of the endogenous gene.
[0232] In some embodiments, endogenous nucleic acids can be modified using signaling by activation. For example, a vector containing multiple copies of an enhancer element of the constitutively active promoter of the cauliflower mosaic virus (CaMV) 35S gene can be used to activate an endogenous gene. See Weigel et al., Plant Physiology, 122: 1003-1013 (2000).
[0233] In some embodiments, endogenous nucleic acids can be modified by introducing a projected transcription activation / repression factor (eg, zinc finger protein transcription factor, or ZFP TF. See, for example, on the internet, sangamo.com/tech/tech_plat_over.html#whatarezfp). For example, a sequence of a synthetic transcription factor from a zinc finger DNA binding domain and a VP16 activation domain can be designed to bind to a specific endogenous DNA site and alter the expression of an endogenous gene. A projected transcriptional activation / repression factor (such as ZFP TF) can activate, repress, or exchange the endogenous target expression of biomass, sucrose, and / or conversion by binding specifically to the promoter region or coding region of the endogenous gene. Engineered nuclei that cleave specific DNA sequences in vivo may still be valuable reagents for targeted mutagenesis. One of these classes of sequence specific nucleases can be created by fusing transcriptional activator-like activators (TALEs) to the catalytic domain of the FokI endonuclease. Both the native and artificial TALE-nuclease fusions direct breaks in the double-stranded DNA to specific target sites. Christian, et al., Genetics 186: 757-761 (2010).
[0234] In some embodiments, endogenous nucleic acids can be modified by mutagenesis. Genetic mutations can be introduced into regenerable plant tissues using one or more mutagens. Suitable mutagenic agents include, for example, ethylmethylsulfonate (SEM), N-nitrous-N-ethylurea (ENU), methyl-N-nitrosoguanidine (MNNG), ethidium bromide, diepoxybutane, ionizing radiation, X-rays, UV rays and other mutagens known in the art. Suitable types of mutations include, for example, nucleotide insertions or deletions, and transitions or transversions in the endogenous nucleic acid sequence. In one embodiment, TILLING (induced local lesions directed at genomes) can be used to produce plants with a modified endogenous nucleic acid. TILLING combines high density mutagenesis with high selectivity screening methods. See, for example, McCallum et al., Nat Biotechnol 18: 455-457 (2000); reviewed by Stemple, Nat Rev Genet 5 (2): 145-50 (2004).
[0235] In some embodiments, an endogenous nucleic acid can be modified via a genetic silencing technique. See, for example, the section of this document on “Inhibiting the expression of a polypeptide modulator of biomass composition”.
[0236] A plant population can be screened and / or selected for those members of the population who have a modified nucleic acid. A plant population can also be screened and / or selected for those members of the population who have a trait or phenotype conferred by the expression of the modified nucleic acid. Alternatively, a plant population can be screened for those plants with a desired trait, such as a modulated level of biomass. For example, a population of progenies can be screened for those plants with a desired level of expression of a polypeptide or nucleic acid modulating biomass composition. Physical and biochemical methods can be used to identify nucleic acids and / or modified expression levels as described with transgenic plants. Selection and / or screening can be carried out in one or more generations, and / or in more than one geographical location. In some cases, plants can be grown and selected under conditions that induce a desired phenotype or are necessary to produce a desired phenotype in a modified plant. In addition, selection and / or screening can be applied during a particular stage of development in which the phenotype is expected to be exhibited by the plant. Selection and / or screening can be performed to choose those transgenic plants with a statistically significant difference in biomass composition compared to a control plant in which the nucleic acid has not been modified. Selected or selected modified plants have an altered phenotype when compared to a corresponding control plant, as described in the section “Phenotypes of transgenic plants” in this document.
[0237] Although a plant or plant cell in which an endogenous biomass composition modulating nucleic acid has been modified is not transgenic for that particular nucleic acid, it will be noted that such a plant or cells may contain transgenes. For example, a modified plant may contain a transgene for other traits, such as herbicide tolerance or insect resistance. As another example, a modified plant may contain one or more transgenes that, together with changes in one or more endogenous nucleic acids, exhibit an increase in one component of the biomass.
[0238] As in the case of transgenic plant cells, modified plant cells can form part or all of a plant. Such plants can be grown in the same way as described for transgenic plants and can be crossed or propagated in the same way as described for transgenic plants. SAW. Plant crossing
[0239] Genetic polymorphisms that are useful in such methods include simple sequential repeats (SSRs, or microsatellites), rapid amplification of polymorphic DNA (RAPDs), simple nucleotide polymorphisms (SNPs), amplified fragment length polymorphisms (AFLPs) and polymorphisms of fragment length restriction (RFLPs). SSR polymorphisms can be identified, for example, by the formation of specific probes for sequences and amplification of model DNAs of individuals in the population of interest for PCR. For example, PCD techniques can be used to enzymatically amplify a genetic marker associated with a nucleotide sequence giving a specific trait (for example, nucleotide sequences described here). PCR can be used to amplify specific DNA sequences, as well as RNA, including sequences of total genomic DNA or total cellular RNA. When RNA is used as a source or model, reverse transcriptase can be used to synthesize complementary strands of DNA (cDNA). Various PCR methods are described, for example, in PCR Primer: A Laboratory Manual, Dieffenbach and Dveksler, eds., Cold Spring Harbor Laboratory Press, 1995.
[0240] Generally, sequential information from polynucleotides flanking the region of interest or beyond is used to design oligonucleotide primers that are identical or similar in sequence to the opposite strands of the model to be amplified. Primers are typically 14 to 40 nucleotides in length, but can be from 10 nucleotides to hundreds of nucleotides in length. The model and amplified DNAs are repeatedly denatured at a high temperature to separate the double strand, then cooled to allow the primers to anneal and extend the nucleotide sequences through the microsatellite, resulting in sufficient DNA for the detection of PCR products. If the probes flank an SSR in the population, PCR products of different sizes will be produced. See, for example, U.S. Patent No. 5,766,847.
[0241] PCR products can be analyzed qualitatively or quantitatively using various techniques. For example, PCR products can be labeled with a fluorescent molecule (for example, PicoGreen® or OliGreen®) and detected in the solution using spectrophotometry or capillary electrophoresis. In some cases, PCR products can be separated into a gel matrix (for example, agarose or polyacrylamide) by electrophoresis, and fractionation bands by size comprising the PCR products can be visualized using nucleic acid markers. Suitable markers can fluoresce under UV light (for example, ethidium bromide, GR Safe, SYBR® Green, or SYBR® Gold). The results can be viewed via transillumination or epi-illumination, and an image of the fluorescence pattern can be acquired using a camera or scanner, for example. The image can be processed and analyzed using spectral software (for example, ImageJ) to measure and compare the intensity of a band of interest against a pattern loaded on the same gel.
[0242] Alternatively, SSR polymorphisms can be identified using PCR product (s) as a probe as opposed to southern spots from different individuals in the population. See Refseth et al., (1997) Electrophoresis 18: 1519. Briefly, PCR products are separated by length using gel electrophoresis and transferred to a membrane. SSR-specific DNA probes, such as oligonucleotides labeled with radioactive, fluorescent, or chromogenic molecules, are applied to the membrane and hybridize to secure the PCR products with a complementary nucleotide sequence. The hybridization pattern can be visualized by autoradiography or by the development of color in the membrane, for example.
[0243] In some cases, PCD products can be quantified using a real-time thermocyclic detection system. For example, quantitative real-time PCR can use a fluorescent dye that forms a complete DNA-dye (for example, SYBR® Green) or a DNA probe containing fluorophore, such as single-stranded oligonucleotides covalently attached to a fluorescent or fluorophore reporter ( for example, 6-carboxyfluorescine or tetrachlorofluorescine) and to a separator (for example, tetramethylrhodamine or mild tripeptide dihydrocyclopyrroloindole groove). The fluorescent signal allows the detection of the amplified product in real time, thus indicating the presence of a sequence of interest, and allowing the quantification of the number of copies of a sequence of interest in the cellular DNA or the level of expression of a sequence of interest from cellular mRNA.
[0244] The identification of RFLPs is discussed, for example, in Alonso-Blanco et al. (Methods in Molecular Biology, vol.82, “Arabidopsis Protocols”, pp. 137-146, J.M. Martinez-Zapater and J. Salinas, eds., C. 1998 by Humana Press, Totowa, NJ); Burr (“Mapping Genes with Recombinant Inbreds”, pp. 249-254, in Freeling, M. and V. Walbot (Ed.), The Maize Handbook, c. 1994 by Springer-Verlag New York, Inc .: New York, NY, USA; Berlin Germany; Burr et al. Genetics (1998) 118: 519; and Gardiner, J. et al., (1993) Genetics 134: 917). For example, to produce an RFLP library enriched with sequences expressed as a single copy or a few copies, the total DNA can be digested with a methylation sensitive enzyme (for example, PstI). Digested DNA can be separated by Petition 870190069500, of 7/22/2019, p. 164/217 156/182 size in a preparatory gel. Polynucleotide fragments (500 to 2000 bp) can be excised, eluted and cloned into a plasmid vector (for example, pUC18). Southern patches of plasmid digestion products can be probed with total fragmented DNA to select clones that hybridize to single copy or few copy sequences. Additional restriction endonucleases can be tested to increase the number of polymorphisms detected.
[0245] The identification of AFLPs is discussed, for example, in EP 0 534 858 and in U.S. Patent No. 5,878,215. In general, total cellular DNA is digested with one or more restriction enzymes. Hemi-site-specific restriction adapters are linked to all restriction fragments and the fragments are selectively amplified with two PCR primers that have corresponding site-specific restrictive and adapter sequences. PCR products can be viewed after fractionation by size, as described above.
[0246] In some modalities, the methods are directed to cross a vegetal lineage. Such methods use genetic polymorphisms identified as described above in a marker assisted breeding program to facilitate the development of strains that have a desired change in the biomass composition. After the identification of a genetic polymorphism as being associated with the variation of a trait, one or more individual plants that have the polymorphic allele correlated with the desired variation are identified. Such plants are then used in a breeding program to combine the polymorphic allele with a plurality of other alleles at other loci that are correlated with the desired variation. Techniques suitable for use in a plant breeding program are known in the art and include, without limitation, repeated breeding, mass selection, pedigree breeding, volume selection, crossing with another population and recurrent selection. These techniques can be used alone or in combination with one or more other techniques in a breeding program. In this way, each identified plant is self-crossed or crossed with a different plant to produce seeds that are then germinated to form progeny plants. At least one of these progeny plants is then self-crossed or crossed with a different plant to form a subsequent generation of progeny. The crossing program can repeat the steps of self-crossing or external crossing for 0 to 5 additional generations as appropriate, in order to achieve the desired uniformity and stability in the resulting plant line, which retain the polymorphic allele. In most breeding programs, an analysis will be performed for the polymorphic allele in particular for each generation, although the analysis can be performed in alternate generations if desired.
[0247] In some cases, selection for other useful traits is also carried out, for example, selection for fungal resistance or bacterial resistance. The selection for such other traits can be carried out before, during or after the identification of the individual plants that have the desired polymorphic allele. VII. Manufacturing Items
[0248] The transgenic plants provided here have various uses in the agriculture and energy production industries. For example, the transgenic plants described here can be used to make animal food and food products. Such plants, however, are generally particularly useful as a substrate for energy production.
[0249] The transgenic plants described here generally have high grain and / or biomass yields per hectare, compared to control plants that do not have exogenous nucleic acid. In some modalities, such transgenic plants provide equivalent or even higher yields of grains and / or biomass per hectare compared to control plants grown under conditions with reduced inputs such as fertilizers or water. In this way, such transgenic plants can be used to provide yield stability at a lower input cost and / or under environmentally stressful conditions such as drought. In some embodiments, the plants described here have a composition that allows for more efficient processing into free sugars, and subsequently ethanol, for energy production. In some modalities, such plants provide higher yields of ethanol, butanol, dimethyl ether, other biofuel molecules, and / or sugar-derived by-products per kilogram of plant material, compared to control plants. Such processing efficiencies are believed to be derived from the composition of plant material, including, but not limited to, content of glycan, cellulose, hemicellulose, and lignin. By providing higher biomass yields at an equivalent or even lower production cost, the transgenic plants described here increase profitability for farmers and processors, while also lowering the cost for consumers.
[0250] Seeds of the transgenic plants described here can be conditioned and bagged in packaging material by means known in the art to form a manufacturing article. Packaging materials such as paper and fabric are known in the art. A seed packet can have a label, for example, a label or tag attached to the packaging material, a label printed on the packaging material, or a label inserted inside the package, which describes the nature of the seeds contained.
[0251] The invention will be further described in the following examples, which do not limit the scope of the invention described in the claims. VIII. Examples EXAMPLE 1 Conversion analysis procedures
[0252] The conversion efficiency of control and transgenic grass lines was determined indirectly using NIR composition and conversion models for grass. See WO2009 / 059176. Samples were prepared for analysis by drying the tissue samples for at least 3 days in an incubator set at 45 ° C. The dry tissues were ground using a Wiley mill filled with 20-mesh fill. The ground samples contained in a flask were scanned three times. The average result of the scans was submitted to the NIR model and the predicted values of pre-treatment liquid (PL) and saccharification (SAC) were determined in this way.
[0253] The conversion yield was directly calculated as follows: [PLN value + SAC value] / biomass mass, where “PLN” refers to the neutralization of the pretreatment liquor, and “SAC” refers to to the sugar value of the saccharification analysis. The following procedures were used to obtain the PLN and SAC values.
[0254] Microwave pretreatment: ground tissues were weighed to obtain approximately 0.025 g. The moisture content of heavy fabrics was determined using the Denver moisture content analyzer. The tissues were transferred to separate bottles of Biotage microwaves that had been previously tared. An appropriate volume of sulfuric acid was then added to the samples to a final concentration of 1.3%. The samples were pre-treated in the microwave using the following configurations: 165 ° C, 5 minutes, very high absorbance, 2.0-5.0 flasks, 600 rpm rotation speed (SWAVE standard). The vials with samples subjected to microwaves were centrifuged at 4000 rpm for 5 minutes with a deceleration rate set to <5. A minimum of 4 mL of PL from each vial was transferred to labeled 15 mL conical tubes. The pH of the PL fraction was measured. The PL was kept frozen until it was ready to be analyzed. The residue in each bottle was washed several times by adding 5 ml of water followed by a centrifugation step at 4000 rpm for 5 minutes. The pH of the wash water was monitored until it reached values between 5 and 6 using appropriate pH indicator strips. The solid fraction was saved for saccharification analysis.
[0255] Pre-treatment liquor analysis: To determine the PLN (neutralized pre-treatment liquor), calcium carbonate was added to an appropriate aliquot of each PL fraction until its pH reached values between 5 and 6. The mixture The neutralized liquid was centrifuged at 4000 rpm for 2 minutes, after which 2 mL of the neutralized liquor was transferred to storage tubes.
[0256] To determine the sugar content, the neutralized fraction (PLN) was analyzed using a YSI sugar analyzer and / or by HPLC.
[0257] Saccharification analysis: Water was added to the solid fraction obtained from the microwave pretreatment. An appropriate volume of the enzyme mixture (containing appropriate masses of proprietary enzymes, tetracycline and cyclohexamide in citrate buffer) was added to the mixture, followed by incubation at 50 ° C in a rotary incubator. At the appropriate time, an aliquot of the reaction was transferred to a microcentrifuge tube. The reaction was stopped by boiling the mixture for 5 minutes. The mixture was centrifuged for 2 minutes at 14000 rpm. The supernatant was removed for sugar analysis using a YSI sugar analyzer and / or by HPLC. This sugar content represents the SAC value. EXAMPLE 2 Protocol for analysis of sucrose
[0258] The sucrose content of control and transgenic grass lines was determined indirectly using the NIR grass composition model. See WO2009 / 059176. The samples were prepared for analysis by drying the tissue samples for at least 3 days in an incubator set at 45 ° C. The dry tissues were ground using a Wiley mill filled with 20-mesh fill. The ground samples contained in a flask were scanned three times. The average result of the scans was submitted to the NIR model and the expected values of PL and SAC were determined in this way.
[0259] The sucrose content of selected samples was directly analyzed as follows. An appropriate amount of ground biomass (3 - 4 g) was placed in a cell vial for extraction using the ASE200 extractor. The extraction was performed using water as a solvent with the extractor configured for the following parameters: pressure of 1500 psi, temperature of 100 ° C, without preheating, ramp for 5 minutes, static step for 7 minutes, and purging for 2 minutes. The volume of the collected extract was measured. Appropriate dilutions of the extracts were analyzed by HPLC analysis to quantify the amount of sucrose using reference standards. The% sucrose content was calculated as follows: the amount of sucrose divided by the amount of biomass used in the extraction. EXAMPLE 3 Transgenic grass strains
[0260] The following symbols are used in relation to transformations: T0: plant regenerated from transformed tissue culture; T1: first generation progeny from self-pollinated T0 plants; T2: first-generation progeny of self-pollinated T1 plants; T3: first generation progeny of self-pollinated T2 plants.
[0261] The following nucleic acids were isolated from Panicum virgatum plants: CeresClone: 1807011 (SEQ ID NO: 1); Ceres Clone 1955550 (SEQ ID NO: 64); CeresClone: 240112 (SEQ ID NO: 245); CeresClone: 1900192 (SEQ ID NO: 279); CeresClone: 1776501 (SEQ ID NO: 347); CeresClone: 1804732 (SEQ ID NO: 415); CeresClone: 1955550 (SEQ ID NO: 640); and CeresClone: 1789981 (SEQ ID NO: 773).
[0262] Each isolated nucleic acid described above was cloned into binary T-DNA vectors, which were introduced into the grass (strains propagated by cloning A26 or A10) by Agrobacterium-mediated transformation essentially as described in Richards et al., Plant Cell. Rep. 20: 48-54 (2001) and Somleva et al., Crop Sci. 42: 2080-2087 (2002). At least two independent events for each transformation were selected for further studies; these events were referred to as screened grass strains. T0 plants were grown in a greenhouse. The presence of each structure was confirmed by PCR. EXAMPLE 4 Prediction by NIR of conversion to transgenic strain PV00467
[0263] T0 tissues from 22 PV00467 events containing Ceres Clone 1955550 (SEQ ID NO: 64) were analyzed as described in Example 1. Several non-transgenic wild-type plants that were regenerated at the same time as the transgenic plants were used as control (also called batch wild type control plants). The amount of glucose released after acid pretreatment (mg / g) of the PV00467 strains is shown in Table 1. The mean of the wild-type control plants (that is, the mass mean of the batch) and the global mean of different controls from different batches (ie, mass mean performed) are also shown in Table 1. The predicted glucose released in the pre-treatment liquor for some of the PV00467 transgenic events was higher when compared to wild type controls (both using the mean value in mass of batches and the value of the mass average performed). Table 1


EXAMPLE 5 NIR prediction of conversion to transgenic strain PV00508
[0264] T0 tissues from 25 PV00508 events containing Ceres Clone 1776501 (SEQ ID NO: 347) were analyzed as described in Example 1. Several non-transgenic wild-type plants that were regenerated at the same time as the transgenic plants were used as control (also called batch wild type control plants). The amount of glucose released after acid pretreatment (mg / g) of the PV00508 strains is shown in Table 2. The mean of the wild type control plants (that is, the mass mean of the batch) and the global mean of different controls from different batches (ie, mass mean performed) are also shown in Table 2. The predicted glucose released in the pretreatment liquor for some of the transgenic PV00508 events was higher when compared to wild type controls (both using the mean value in mass of batches and the value of the mass average performed). Table 2

EXAMPLE 6 Sucrose content of UAC-20, UAC-22, and UAC-15 transgenic strains
[0265] T0 fabrics of 5 UAC-20 events containing Ceres Clone 1900192 (SEQ ID NO: 279), 7 UAC-22 events containing Ceres Clone 1807011 (SEQ ID NO: 1), and 3 UAC-15 events containing Ceres Clone 1804732 (SEQ ID NO: 415) were analyzed as described in example 2. UAC-FA4 and UAC-NK4K were used as a control. UAC-FA4 is a regenerated wild-type plant that has not been transformed. UAC-NB4K corresponds to plants that have been regenerated from a stem transformed with an empty vector (that is, without inserts). The average total sucrose content is shown in Table 3. All seven UAC-22 events had a higher total sucrose content while three of the UAC-20 events and two of the UAC-15 events had a higher total sucrose content. Table 3

EXAMPLE 7 Prediction by NIR from conversion to transgenic lines UAC-15, UAC-19, and UAC-22
[0266] T0 tissues from a UAC-15 event containing Ceres Clone 1804732 (SEQ ID NO: 415), a UAC-19 event containing Ceres Clone 1789981 (SEQ ID NO: 773), and a UAC-22 event containing Ceres Clone 1807011 (SEQ ID NO: 1) were analyzed as described in example 1. UAC-FA4 and UAC-NK4K were used as controls and NREL SWG was used as the reference standard. UAC-FA4 is a regenerated wild-type plant that has not been transformed. UAC-NB4K corresponds to plants that have been regenerated from a stem transformed with an empty vector (that is, without inserts). NREL SWG is a composite grass biomass obtained from the National Renewable Energy Laboratory (NREL) and was used as a methodological control to determine the consistency of the analytical techniques. The amount of total glucose released per gram of dry mass, and the values of PLN and SAC are shown in Table 4 for four experiments in which different amounts of enzymes were used in the analysis of saccharification. Higher values of total glucose released per gram of dry matter were observed for each of the transgenic strains regardless of the amount of enzyme. At a standard level of enzyme quantity (ie, 20 mgP / g), the total glucose released by the transgenic strains UAC-15-6 and UAC-19-2 was higher than that of controls and the reference standard. This increase was primarily due to an increase in the amount of glucose released during pretreatment. When the amount of enzymes was reduced by 8 times (ie 2.5 mgP / g), the total glucose released by the transgenic strains UAC-15-6 and UAC-19-2 was similar to that of the control treated at the same level of standard enzymes. Table 4


EXAMPLE 8 Prediction by NIR of conversion to transgenic strain PV00460
[0267] T0 fabrics of three PV00460 events containing Ceres Clone 240112 (SEQ ID NO: 245) were analyzed as described in example 1. Pv-WT (A26) -72 was the wild type control used, and corresponds to a plant regenerated but not transformed. The amount of total glucose released per gram of dry mass, and the values of PLN and SAC are shown in Table 5 for four experiments in which different amounts of enzymes were used in the analysis of saccharification. Higher values of total glucose released per gram of dry matter were observed for each of the transgenic strains regardless of the amount of enzyme. At a standard level of enzyme quantity (ie, 20 mgP / g), the total glucose released by the transgenic strain PV00460 (especially event # 18) was higher than that of the controls and the reference standard. This increase was primarily due to an increase in the amount of glucose released during pretreatment. When the amount of enzymes was reduced by 8 times (ie 2.5 mgP / g), the total glucose released by the transgenic strain PV00460 (eg event # 18) was similar to that of the control treated at the same level as standard enzymes . Table 5



EXAMPLE 9 Determination of functional counterparts by reciprocal BLAST
[0268] A candidate sequence was considered to be a functional homologue of a reference sequence if the candidate and reference sequences encode proteins with a similar function and / or activity. A process known as reciprocal BLAST (Rivera et al., Proc. Natl. Acad. Sci. USA, 95: 6239-6244 (1998)) was used to identify potential functional homologous sequences from databases consisting of all sequences public and proprietary peptides available, including NCBBI NR and peptide translations from Ceres clones.
[0269] Before the start of a reciprocal BLAST process, a specific reference polypeptide was confronted with all the peptides of its species of origin using BLAST in order to identify polypeptides with a BLAST sequential identity of 80% or more in relation to the polypeptide of reference and an alignment length of 85% or more along the shortest alignment sequence. The reference polypeptide and any of the aforementioned identified polypeptides were treated as a pool.
[0270] The BLASTP version 2.0 program at the University of Washington in Saint Louis, Missouri, USA, was used to determine the BLAST sequential identity and the E value. The BLASTP version 2.0 program includes the following parameters: 1) an E value of cut 1.5 e-5; 2) a word size of 5; and 3) the -postsw option. The BLAST sequential identity was calculated based on the alignment of the first BLAST HSP (high scoring segment pairs) of the potential functional homologous sequence identified with a specific reference polypeptide. The number of residues identically recognized in the HSP BLAST alignment was divided by the length of the HSP, and then multiplied by 100 to obtain the BLAST sequential identity. The length of the HSP typically included gaps in the alignment, but in some cases the gaps were excluded.
[0271] The main reciprocal BLAST process consists of two rounds of BLAST searches; direct search and reverse search. In the direct search stage, a reference polypeptide sequence, “polypeptide A”, from a species of AS origin was BLASTed against all protein sequences of a species of interest. The main results were determined using a cutoff E value of 10-5 and a sequential cutoff identity of 35%. Among the main results, the sequence with the lowest E value was designated as the best result, and considered a potential functional counterpart or orthologist. Any other result with a sequential identity of 80% or more in relation to the best result or the original reference polypeptide was also considered a potential functional homologue or orthologist. This process was repeated for all species of interest.
[0272] In the reverse search round, the main results identified in the direct search of all species were BLASTed against all protein sequences of the species of origin AS. A main result of the direct search that returned a polypeptide from the group mentioned above as its best result was also considered as a potential functional homolog.
[0273] Functional homologs were identified by manual inspection of potential functional homologous sequences. Representative functional homologues for SEQ ID NOs: 483, 562, 246, 111, 348, 774, 416, 2, 157, 280, 641, and 26 are shown in Figures 1-12, respectively. Additional exemplifying counterparts are correlated to certain Figures in the Sequential Listing. EXAMPLE 10 Determination of functional homologists by Markov's hidden models
[0274] Hidden Markov models (HMMs) were generated by the HMMER 2.3.2 program. To generate each HMM, the standard parameters of the HMMER 2.3.2 program, configured for global alignments, were used.
[0275] An HMM was generated using the strings shown in Figure 1 as inputs. These sequences have been fitted to the model and a representative HMM score for each sequence is shown in the Sequential Listing. Additional strings have been fitted to the model, and representative HMM scores for such additional strings are shown in the Sequential Listing. The results indicate that these additional sequences are functional homologues of SEQ ID NO: 483.
[0276] The above procedure was repeated and an HMM was generated for each group of sequences shown in Figures 2-12, using the sequences shown in each Figure as inputs for that HMM. A representative HMM score for each sequence is shown in the Sequential Listing. Additional strings have been fitted to certain HMMs, and representative HMM scores for such additional strings are shown in the Sequential Listing. The results indicate that these additional sequences are functional homologues of the sequences used to generate that HMM. Other modalities
[0277] It should be understood that, while the invention has been described in conjunction with its detailed description, the description presented is intended to illustrate and not to limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
权利要求:
Claims (9)
[0001]
1. Method for producing a plant characterized by comprising the growth of a plant cell comprising an exogenous nucleic acid, said exogenous nucleic acid comprising a regulatory region operatively linked to a nucleotide sequence, the nucleotide sequence comprising the polynucleotide sequence of SEQ ID NO: 1, and a plant produced from such a plant cell has an increase in sucrose content or an increase in conversion efficiency compared to a control plant that does not comprise said nucleic acid.
[0002]
Method according to claim 1, characterized in that the increase in sucrose content or increase in conversion efficiency in said plant is an increase in sucrose content.
[0003]
Method according to claim 1, characterized in that the increase in sucrose content or increase in conversion efficiency in said plant is an increase in conversion efficiency.
[0004]
4. A method for modulating the biomass composition in a plant characterized by the introduction into an plant cell of an exogenous nucleic acid, said exogenous nucleic acid comprising a regulatory region operationally linked to a nucleotide sequence, the nucleotide sequence comprising the polynucleotide sequence of SEQ ID NO: 1, and a plant produced from such a plant cell has an increase in sucrose content or an increase in conversion efficiency compared to a control plant that does not comprise said exogenous nucleic acid.
[0005]
Method according to claim 4, characterized in that the increase in sucrose content or increase in conversion efficiency in said plant is an increase in sucrose content.
[0006]
Method according to claim 4, characterized in that the increase in sucrose content or increase in conversion efficiency in said plant is an increase in conversion efficiency.
[0007]
7. Isolated nucleic acid characterized by comprising the polynucleotide sequence of SEQ ID NO: 1 operably linked to a heterologous regulatory nucleotide sequence.
[0008]
8. Method for altering the biomass composition in a plant characterized by understanding the modification of the endogenous polynucleotide sequence of SEQ ID NO: 1, the said plant having a difference in the biomass composition compared to the corresponding composition of a plant control in which said nucleic acid has not been modified.
[0009]
Method according to claim 8, characterized in that it further comprises selecting plants having altered biomass composition.
类似技术:
公开号 | 公开日 | 专利标题
US20200095599A1|2020-03-26|Transgenic plants having altered biomass composition
US11001852B1|2021-05-11|Polynucleotide sequences and proteins encoded thereby useful for modifying plant characteristics
US10822616B2|2020-11-03|Transgenic plants having altered biomass composition
US9441233B2|2016-09-13|Transgenic plants having increased biomass
US11162108B2|2021-11-02|Transgenic plants having increased biomass
US9701726B2|2017-07-11|Nucleotide sequences and corresponding polypeptides conferring modulated plant characteristics
US8344211B2|2013-01-01|Plant nucleotide sequences and corresponding polypeptides
BR122018014094B1|2021-01-05|production methods of a dicot plant and to increase heat tolerance in a dicot plant
WO2010033564A1|2010-03-25|Transgenic plants having increased biomass
US20120260373A1|2012-10-11|Transgenic plants having enhanced biomass composition
EP3169785B1|2021-09-15|Methods of increasing crop yield under abiotic stress
US8298794B2|2012-10-30|Cinnamyl-alcohol dehydrogenases
Ye et al.2020|A semi-dominant mutation in OsCESA9 improves rice straw return to the field and increases salt tolerance by remodeling the cell wall in rice
同族专利:
公开号 | 公开日
BR112013010278A2|2016-07-05|
US20130191943A1|2013-07-25|
US20180105828A1|2018-04-19|
US20200095599A1|2020-03-26|
US9828608B2|2017-11-28|
BR132013011046E2|2019-06-25|
WO2012058223A1|2012-05-03|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US4987071A|1986-12-03|1991-01-22|University Patents, Inc.|RNA ribozyme polymerases, dephosphorylases, restriction endoribonucleases and methods|
US5254678A|1987-12-15|1993-10-19|Gene Shears Pty. Limited|Ribozymes|
US5766847A|1988-10-11|1998-06-16|Max-Planck-Gesellschaft Zur Forderung Der Wissenschaften E.V.|Process for analyzing length polymorphisms in DNA regions|
US5034323A|1989-03-30|1991-07-23|Dna Plant Technology Corporation|Genetic engineering of novel plant phenotypes|
US5231020A|1989-03-30|1993-07-27|Dna Plant Technology Corporation|Genetic engineering of novel plant phenotypes|
US6946587B1|1990-01-22|2005-09-20|Dekalb Genetics Corporation|Method for preparing fertile transgenic corn plants|
US5484956A|1990-01-22|1996-01-16|Dekalb Genetics Corporation|Fertile transgenic Zea mays plant comprising heterologous DNA encoding Bacillus thuringiensis endotoxin|
US5204253A|1990-05-29|1993-04-20|E. I. Du Pont De Nemours And Company|Method and apparatus for introducing biological substances into living cells|
PT969102E|1991-09-24|2008-03-25|Keygene Nv|Primers, kits and sets of restriction fragments used in selective restriction fragment amplification|
US6326527B1|1993-08-25|2001-12-04|Dekalb Genetics Corporation|Method for altering the nutritional content of plant seed|
US5878215A|1994-05-23|1999-03-02|Mastercard International Incorporated|System and method for processing multiple electronic transaction requests|
AU710874B2|1995-06-30|1999-09-30|Dna Plant Technology Corporation|Delayed ripening tomato plants|
JPH10117776A|1996-10-22|1998-05-12|Japan Tobacco Inc|Transformation of indica rice|
GB9703146D0|1997-02-14|1997-04-02|Innes John Centre Innov Ltd|Methods and means for gene silencing in transgenic plants|
US6114608A|1997-03-14|2000-09-05|Novartis Ag|Nucleic acid construct comprising bacillus thuringiensis cry1Ab gene|
GB9710475D0|1997-05-21|1997-07-16|Zeneca Ltd|Gene silencing|
US6452067B1|1997-09-19|2002-09-17|Dna Plant Technology Corporation|Methods to assay for post-transcriptional suppression of gene expression|
US6506559B1|1997-12-23|2003-01-14|Carnegie Institute Of Washington|Genetic inhibition by double-stranded RNA|
AUPP249298A0|1998-03-20|1998-04-23|Ag-Gene Australia Limited|Synthetic genes and genetic constructs comprising same I|
US20040214330A1|1999-04-07|2004-10-28|Waterhouse Peter Michael|Methods and means for obtaining modified phenotypes|
US20090087878A9|1999-05-06|2009-04-02|La Rosa Thomas J|Nucleic acid molecules associated with plants|
US20100293669A2|1999-05-06|2010-11-18|Jingdong Liu|Nucleic Acid Molecules and Other Molecules Associated with Plants and Uses Thereof for Plant Improvement|
US6423885B1|1999-08-13|2002-07-23|Commonwealth Scientific And Industrial Research Organization |Methods for obtaining modified phenotypes in plant cells|
GB9925459D0|1999-10-27|1999-12-29|Plant Bioscience Ltd|Gene silencing|
US7858848B2|1999-11-17|2010-12-28|Mendel Biotechnology Inc.|Transcription factors for increasing yield|
US7834146B2|2000-05-08|2010-11-16|Monsanto Technology Llc|Recombinant polypeptides associated with plants|
WO2002059257A2|2000-10-31|2002-08-01|Commonwealth Scientific And Industrial Research Organisation|Method and means for producing barley yellow dwarf virus resistant cereal plants|
WO2002046449A2|2000-12-07|2002-06-13|The Penn State Research Foundation|Selection of catalytic nucleic acids targeted to infectious agents|
USPP13008P2|2001-03-19|2002-09-24|Hortech, Inc.|Miscanthus sinensis plant named ‘Little Zebra’|
EP1402042A2|2001-06-22|2004-03-31|Syngenta Participations AG|Abiotic stress responsive polynucleotides and polypeptides|
WO2003000038A2|2001-06-22|2003-01-03|The Regents Of The University Of California|Compositions and methods for modulating plant development|
US7112429B2|2001-07-28|2006-09-26|Midwest Research Institute|Thermal tolerant mannanase from acidothermus cellulolyticus|
GB0119342D0|2001-08-08|2001-10-03|Gemstar Cambridge Ltd|Starch modification|
US20020182701A1|2001-08-30|2002-12-05|Saint Louis University|Dominant negative variants of methionine aminopeptidase 2 and clinical uses thereof|
US7576262B2|2002-03-14|2009-08-18|Commonwealth Scientific And Industrial Research Organization|Modified gene-silencing RNA and uses thereof|
US20030175783A1|2002-03-14|2003-09-18|Peter Waterhouse|Methods and means for monitoring and modulating gene silencing|
USPP14743P2|2003-01-21|2004-05-04|C. Greg Speichert|Miscanthus plant named ‘Gilded Tower’|
USPP16176P3|2003-03-27|2006-01-03|Cosner Harlan B|Impatiens plant named ‘TiWhit’|
US20060260004A1|2004-04-01|2006-11-16|Yiwen Fang|Par-related protein promoters|
US7429692B2|2004-10-14|2008-09-30|Ceres, Inc.|Sucrose synthase 3 promoter from rice and uses thereof|
US7402667B2|2003-10-14|2008-07-22|Ceres, Inc.|Promoter, promoter control elements, and combinations, and uses thereof|
US20060015970A1|2003-12-12|2006-01-19|Cers, Inc.|Nucleotide sequences and polypeptides encoded thereby useful for modifying plant characteristics|
USPP15193P2|2004-01-05|2004-09-28|Michael Vern Smith|Miscanthus plant named ‘Gold Bar’|
US20070006335A1|2004-02-13|2007-01-04|Zhihong Cook|Promoter, promoter control elements, and combinations, and uses thereof|
WO2005098007A2|2004-04-01|2005-10-20|Ceres, Inc.|Promoter, promoter control elements, and combinations, and uses thereof|
US7214789B2|2004-06-30|2007-05-08|Ceres, Inc.|Promoter, promoter control elements, and combinations, and uses thereof|
US7598367B2|2005-06-30|2009-10-06|Ceres, Inc.|Early light-induced protein promoters|
WO2006023766A2|2004-08-20|2006-03-02|Ceres Inc.|P450 polynucleotides, polypeptides, and uses thereof|
US8137961B2|2004-09-08|2012-03-20|J.R. Simplot Company|Plant-specific genetic elements and transfer cassettes for plant transformation|
US7279617B2|2004-09-22|2007-10-09|Ceres, Inc.|Promoter, promoter control elements, and combinations, and uses thereof|
US7378571B2|2004-09-23|2008-05-27|Ceres, Inc.|Promoter, promoter control elements, and combinations, and uses thereof|
WO2006115575A1|2005-04-20|2006-11-02|Ceres Inc.|Regulatory regions from papaveraceae|
EP2422615B1|2005-07-29|2014-06-18|Targeted Growth, Inc.|Dominant negative mutant krp protein protection of active cyclin-cdk complex inhibition by wild-type krp|
US7244879B2|2005-10-12|2007-07-17|Ceres, Inc.|Nucleotide sequences and polypeptides encoded thereby useful for modifying plant characteristics in response to cold|
WO2007055826A1|2005-11-04|2007-05-18|Ceres, Inc.|Modulation of fertility in monocots|
WO2007127501A2|2006-01-09|2007-11-08|Ceres, Inc.|Novel endosperm regulatory regions|
WO2007120989A2|2006-02-24|2007-10-25|Ceres, Inc.|Shade regulatory regions|
USPP18161P2|2006-06-15|2007-10-30|Probst Darrell R|Miscanthus plant named ‘Super Stripe’|
EP2543735A1|2007-06-06|2013-01-09|Monsanto Technology LLC|Genes and uses for plant enhancement|
GB0718377D0|2007-09-21|2007-10-31|Cambridge Entpr Ltd|Improvements in or relating to organic compounds|
US8362325B2|2007-10-03|2013-01-29|Ceres, Inc.|Nucleotide sequences and corresponding polypeptides conferring modulated plant characteristics|
BRPI0818924B1|2007-11-02|2020-04-14|Ceres Inc|method of formulating a nir model|
WO2009099899A2|2008-02-01|2009-08-13|Ceres, Inc.|Promoter, promoter control elements, and combinations, and use thereof|
CN102027115A|2008-03-31|2011-04-20|希尔雷斯股份有限公司|Promoter, promoter control elements, and combinations, and uses thereof|
KR20150046788A|2009-04-29|2015-04-30|바스프 플랜트 사이언스 게엠베하|Plants having enhanced yield-related traits and a method for making the same|
WO2011044254A1|2009-10-07|2011-04-14|Ceres, Inc.|Transgenic plants having enhanced biomass composition|
WO2011140329A1|2010-05-06|2011-11-10|Ceres, Inc.|Transgenic plants having increased biomass|
WO2012009551A1|2010-07-16|2012-01-19|Ceres, Inc.|Promoter, promoter control elements, and combinations, and uses thereof|
BR112013010278B1|2010-10-27|2020-12-29|Ceres, Inc|method to produce a plant, method to modulate the biomass composition in a plant, isolated nucleic acid and method to alter the biomass composition in a plant|BR112013010278B1|2010-10-27|2020-12-29|Ceres, Inc|method to produce a plant, method to modulate the biomass composition in a plant, isolated nucleic acid and method to alter the biomass composition in a plant|
WO2013075817A1|2011-11-21|2013-05-30|Bayer Intellectual Property Gmbh|Fungicide n-[methyl]-carboxamide derivatives|
CN105906567B|2011-11-30|2019-01-22|拜耳知识产权有限责任公司|Antifungal N- bicyclic alkyl and N- tricyclic alkylcarboxamide derivative|
TWI557120B|2011-12-29|2016-11-11|拜耳知識產權公司|Fungicidal 3-[methyl]-2-substituted-1,2,4-oxadiazol-5-one derivatives|
TWI558701B|2011-12-29|2016-11-21|拜耳知識產權公司|Fungicidal 3-[methyl]-2-sub stituted-1,2,4-oxadiazol-5-one derivatives|
CN105008540A|2012-09-20|2015-10-28|纳幕尔杜邦公司|Compositions and methods conferring resistance of maize to corn rootworm ii|
CA2886104A1|2012-09-20|2014-03-27|E.I. Du Pont De Nemours And Company|Management of corn rootworm and other insect pests|
EP2908641B1|2012-10-19|2018-01-10|Bayer Cropscience AG|Method for treating plants against fungi resistant to fungicides using carboxamide or thiocarboxamide derivatives|
UA114822C2|2012-10-19|2017-08-10|Байєр Кропсайнс Аг|Active compound combinations comprising carboxamide derivatives|
EP2908640B1|2012-10-19|2019-10-02|Bayer Cropscience AG|Method of plant growth promotion using carboxamide derivatives|
UA114647C2|2012-10-19|2017-07-10|Байєр Кропсайнс Аг|Method for enhancing tolerance to abiotic stress in plants using carboxamide or thiocarboxamide derivatives|
CA2909725A1|2013-04-19|2014-10-23|Bayer Cropscience Aktiengesellschaft|Method for improved utilization of the production potential of transgenic plants|
WO2014177514A1|2013-04-30|2014-11-06|Bayer Cropscience Ag|Nematicidal n-substituted phenethylcarboxamides|
TW201507722A|2013-04-30|2015-03-01|Bayer Cropscience Ag|N-carboxamides as nematicides and endoparasiticides|
EP2837287A1|2013-08-15|2015-02-18|Bayer CropScience AG|Use of prothioconazole for increasing root growth of Brassicaceae|
US9101100B1|2014-04-30|2015-08-11|Ceres, Inc.|Methods and materials for high throughput testing of transgene combinations|
US10480000B2|2014-07-15|2019-11-19|Ceres, Inc.|Methods of increasing crop yield under abiotic stress|
US20180072972A1|2016-09-09|2018-03-15|Alpha Revolution, Inc.|Systems, devices and methods for fermenting beverages|
WO2019233863A1|2018-06-04|2019-12-12|Bayer Aktiengesellschaft|Herbicidally active bicyclic benzoylpyrazoles|
CN111172172B|2020-02-18|2021-02-12|南京林业大学|Regulatory gene PdeMIXTA02 for initial development of populus deltoides and application thereof|
法律状态:
2018-04-03| B06F| Objections, documents and/or translations needed after an examination request according art. 34 industrial property law|
2019-05-28| B06T| Formal requirements before examination|
2019-10-01| B06A| Notification to applicant to reply to the report for non-patentability or inadequacy of the application according art. 36 industrial patent law|
2020-03-31| B06A| Notification to applicant to reply to the report for non-patentability or inadequacy of the application according art. 36 industrial patent law|
2020-09-15| B09A| Decision: intention to grant|
2020-12-29| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 25/10/2011, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US40728010P| true| 2010-10-27|2010-10-27|
US61/407,280|2010-10-27|
PCT/US2011/057709|WO2012058223A1|2010-10-27|2011-10-25|Transgenic plants having altered biomass composition|
[返回顶部]